# Pandas
## Dictionary to DataFrame
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.

In [1]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict={
    'country':['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'],
    'drives_right':[True, False, False, False, True, True, True],
     'cpc':[809, 731, 588, 18, 200, 70, 45]
    }

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)

         country  drives_right  cpc
0  United States          True  809
1      Australia         False  731
2          Japan         False  588
3          India         False   18
4         Russia          True  200
5        Morocco          True   70
6          Egypt          True   45


## Changing index values
Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6?

In [2]:
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

print(cars)

           country  drives_right  cpc
US   United States          True  809
AUS      Australia         False  731
JPN          Japan         False  588
IN           India         False   18
RU          Russia          True  200
MOR        Morocco          True   70
EG           Egypt          True   45


## Square Brackets
We saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.

In [5]:
# Print out country column as Pandas Series
print(cars['country'])

US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object


In [6]:
# Print out country column as Pandas DataFrame
print(cars[['country']])

           country
US   United States
AUS      Australia
JPN          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt


In [7]:
# Print out DataFrame with country and drives_right columns
print(cars[['country','drives_right']])

           country  drives_right
US   United States          True
AUS      Australia         False
JPN          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True


##### Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. 

In [8]:
# Print out first 3 observations
print(cars[:3])

           country  drives_right  cpc
US   United States          True  809
AUS      Australia         False  731
JPN          Japan         False  588


In [10]:
# Print out fourth, fifth and sixth observation
print(cars[3:6])

     country  drives_right  cpc
IN     India         False   18
RU    Russia          True  200
MOR  Morocco          True   70


## loc and iloc
With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index

In [12]:
# Print out observation for Japan using loc
print(cars.loc['JPN'])

country         Japan
drives_right    False
cpc               588
Name: JPN, dtype: object


In [13]:
# Print observation of Japan using iloc as Series
print(cars.iloc[2])

country         Japan
drives_right    False
cpc               588
Name: JPN, dtype: object


In [15]:
# Print out observations for Australia and Egypt using loc as DataFrame
print(cars.loc[['AUS','EG']])

       country  drives_right  cpc
AUS  Australia         False  731
EG       Egypt          True   45


In [21]:
#Print out observations of Australia and Egypt using iloc as DataFrame
print(cars.iloc[[1,6]])

       country  drives_right  cpc
AUS  Australia         False  731
EG       Egypt          True   45


#### loc and iloc also allow you to select both rows and columns from a DataFrame. 

In [18]:
# Print sub-DataFrame
print(cars.loc[['RU','MOR'],['country','drives_right']])

     country  drives_right
RU    Russia          True
MOR  Morocco          True


##### It's also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma:

In [20]:
# Print out drives_right column as Series
print(cars.loc[:,'drives_right'])

US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool


In [22]:
# Print out drives_right column as DataFrame
print(cars.loc[:,['drives_right']])

     drives_right
US           True
AUS         False
JPN         False
IN          False
RU           True
MOR          True
EG           True


In [25]:
# Print out cars_per_cap and drives_right as DataFrame
print(cars.iloc[:, [0, 2]])

           country  cpc
US   United States  809
AUS      Australia  731
JPN          Japan  588
IN           India   18
RU          Russia  200
MOR        Morocco   70
EG           Egypt   45
