# Numpy Arrays

**Getting started**

Numpy arrays are great alternatives to Python Lists. Some of the key advantages of Numpy arrays are that they are: 
1. fast, 
2. easy to work with, and 
3. give users the opportunity to perform calculations across entire arrays.

In the following example, you will first create two Python lists. Then, you will import the numpy package and create numpy arrays out of the newly created lists.

In [None]:
# Create 2 new lists height and weight
height = [1.87,  1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]

# Import the numpy package as np
import numpy as np

# Create 2 numpy arrays from height and weight
np_height = np.array(height)
np_weight = np.array(weight)

**Print out the type of np_height**

In [None]:
print(type(np_height))

<class 'numpy.ndarray'>


**Element-wise calculations**

Now we can perform element-wise calculations on height and weight. For example, you could take all 6 of the height and weight observations above, and calculate the BMI for each observation with a single equation. These operations are very fast and computationally efficient. They are particularly helpful when you have 1000s of observations in your data.

In [None]:
# Calculate bmi
bmi = np_weight / np_height ** 2

# Print the result
print(bmi)

[23.34925219 27.88755755 28.75558507 25.48723993 23.87257618 25.84368152]


In [None]:
# Calculate bmi
bmi = weight / height ** 2

# Print the result
print(bmi)

TypeError: ignored

**Subsetting**

Another great feature of Numpy arrays is the ability to subset. For instance, if you wanted to know which observations in our BMI array are above 23, we could quickly subset it to find out.

In [None]:
# For a boolean response
bmi > 25

array([False,  True,  True,  True, False,  True])

In [None]:
# Print only those observations above 23
bmi[bmi > 25]

array([27.88755755, 28.75558507, 25.48723993, 25.84368152])

**Exercise**

First, convert the list of weights from a list to a Numpy array. Then, convert all of the weights from kilograms to pounds. Use the scalar conversion of 2.2 lbs per kilogram to make your conversion. Lastly, print the resulting array of weights in pounds.

In [None]:
weight_kg = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]

import numpy as np

# Create a numpy array np_weight_kg from weight_kg
np_weight_kg = np.array(weight_kg)

# Create np_weight_lbs from np_weight_kg
np_weight_lbs = np_weight_kg * 2.2

# Print out np_weight_lbs
print(np_weight_lbs)

[179.63  214.544 209.55  204.556 189.596 194.59 ]


# Pandas Basics
# Pandas DataFrames
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

There are several ways to create a DataFrame. One way way is to use a dictionary. For example:

In [None]:
dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
       "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
       "area": [8.516, 17.10, 3.286, 9.597, 1.221],
       "population": [200.4, 143.5, 1252, 1357, 52.98] }

print(dict)

import pandas as pd
brics = pd.DataFrame(dict)
print(brics)

{'country': ['Brazil', 'Russia', 'India', 'China', 'South Africa'], 'capital': ['Brasilia', 'Moscow', 'New Dehli', 'Beijing', 'Pretoria'], 'area': [8.516, 17.1, 3.286, 9.597, 1.221], 'population': [200.4, 143.5, 1252, 1357, 52.98]}
        country    capital    area  population
0        Brazil   Brasilia   8.516      200.40
1        Russia     Moscow  17.100      143.50
2         India  New Dehli   3.286     1252.00
3         China    Beijing   9.597     1357.00
4  South Africa   Pretoria   1.221       52.98


As you can see with the new brics DataFrame, Pandas has assigned a key for each country as the numerical values 0 through 4. If you would like to have different index values, say, the two letter country code, you can do that easily as well.

In [None]:
# Set the index for brics
brics.index = ["BR", "RU", "IN", "CH", "SA"]

# Print out brics with new index values
print(brics)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Dehli   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


Another way to create a DataFrame is by importing a csv file using Pandas. Now, the csv cars.csv is stored and can be imported using pd.read_csv:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/cars.csv')

# Print out cars
print(cars)

  name  cars_per_cap        country  drives_right
0   US           809  United States          True
1  AUS           731      Australia         False
2  JAP           588          Japan         False
3   IN            18          India         False
4   RU           200         Russia          True
5  MOR            70        Morocco          True
6   EG            45          Egypt          True


**Indexing DataFrames**

There are several ways to index a Pandas DataFrame. One of the easiest ways to do this is by using square bracket notation.

In the example below, you can use square brackets to select one column of the cars DataFrame. You can either use a single bracket or a double bracket. The single bracket will output a Pandas Series, while a double bracket will output a Pandas DataFrame.



In [None]:
# Import pandas and cars.csv
import pandas as pd
cars = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars['cars_per_cap'])

# Print out country column as Pandas DataFrame
print(cars[['cars_per_cap']])

# Print out DataFrame with country and drives_right columns
print(cars[['cars_per_cap', 'country']])

name
US     809
AUS    731
JAP    588
IN      18
RU     200
MOR     70
EG      45
Name: cars_per_cap, dtype: int64
      cars_per_cap
name              
US             809
AUS            731
JAP            588
IN              18
RU             200
MOR             70
EG              45
      cars_per_cap        country
name                             
US             809  United States
AUS            731      Australia
JAP            588          Japan
IN              18          India
RU             200         Russia
MOR             70        Morocco
EG              45          Egypt


Square brackets can also be used to access observations (rows) from a DataFrame. For example:

In [None]:
cars.head()
cars.tail()
cars.count()

cars_per_cap    7
country         7
drives_right    7
dtype: int64

In [None]:
# Import cars data
import pandas as pd
cars = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/cars.csv', index_col = 0)

# Print out first 4 observations
print(cars[0:4])

# Print out fifth and sixth observation
print(cars[4:7])

      cars_per_cap        country  drives_right
name                                           
US             809  United States          True
AUS            731      Australia         False
JAP            588          Japan         False
IN              18          India         False
      cars_per_cap  country  drives_right
name                                     
RU             200   Russia          True
MOR             70  Morocco          True
EG              45    Egypt          True


You can also use loc and iloc to perform just about any data selection operation. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.

In [None]:
# Import cars data
import pandas as pd
cars = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/cars.csv', index_col = 0)

# Print out observation for Japan
print(cars.iloc[2])

# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
      cars_per_cap    country  drives_right
name                                       
AUS            731  Australia         False
EG              45      Egypt          True
