# Pandas

Pandas and the Panda DataFrame is powerful for manipulating large datasets and unlike Numpy can contain data of more than one type. It can also pull in and read CSV files.

In [16]:
import pandas as pd

cars = pd.read_csv(r"C:\Users\lrspe\Desktop\MS Data Science\5. Python for Data Science\cars.csv", index_col = 0)

print(cars)

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True


To make column selections of the data, square brackets can be used - single for a Panda series, or double for a Panda DataFrame. 

In [18]:
print(cars['cars_per_cap'])

US     809
AUS    731
JAP    588
IN      18
RU     200
MOR     70
EG      45
Name: cars_per_cap, dtype: int64


Further sub-DataFrames can be made, creating a DataFrame from a pre-existing one, or selecting just one row but displaying it as a series in a column.

In [20]:
print(cars.loc['JAP'])

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object


In [21]:
print(cars.loc[['US','AUS']])

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False


In [23]:
print(cars.loc[['MOR'], 'drives_right'])

print(cars.loc[['MOR', 'RU'], ['country', 'drives_right']])

MOR    True
Name: drives_right, dtype: bool
     country  drives_right
MOR  Morocco          True
RU    Russia          True


Another useful tool is adding custom columns to the dataset - either by a list or by existing variables.

In [39]:
# Now this has added a new column as specified by the list
cars["gdp"] = ['High', 'High', 'High', 'Medium', 'Medium', 'Low', 'Low']]
print(cars)

     cars_per_cap        country  drives_right     gdp
US            809  United States          True    High
AUS           731      Australia         False    High
JAP           588          Japan         False    High
IN             18          India         False  Medium
RU            200         Russia          True  Medium
MOR            70        Morocco          True     Low
EG             45          Egypt          True     Low


In [42]:
# Here a custom column "cars/100" is added dividing "cars_per_cap" by 100
cars["cars/100"] = cars["cars_per_cap"] / 100
print(cars)

     cars_per_cap        country  drives_right     gdp  cars/100
US            809  United States          True    High      8.09
AUS           731      Australia         False    High      7.31
JAP           588          Japan         False    High      5.88
IN             18          India         False  Medium      0.18
RU            200         Russia          True  Medium      2.00
MOR            70        Morocco          True     Low      0.70
EG             45          Egypt          True     Low      0.45
