#### `Dictionariception`
Remember lists? They could contain anything, even other lists. Well, for dictionaries the same holds. Dictionaries can contain key:value pairs where the values are again dictionaries.

As an example, have a look at the script where another version of __europe__ - the dictionary you've been working with all along - is coded. The keys are still the country names, but the values are dictionaries that contain more information than just the capital.

It's perfectly possible to chain square brackets to select elements. To fetch the population for Spain from __europe__, for example, you need:

europe['spain']['population']


#### `Instructions`
- Use chained square brackets to select and print out the capital of France.
- Create a dictionary, named __data__, with the keys '__capital__' and '__population__'. Set them to '__rome__' and ___59.83___, respectively.
- Add a new key-value pair to __europe__; the key is '__italy__' and the value is __data__, the dictionary you just built.

In [1]:
# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }


# Print out the capital of France
print(europe['france'])

# Create sub-dictionary data
data = {'capital' : 'rome',
        'population': 59.83,
}

# Add data to europe under key 'italy'
europe['italy'] = data

# Print europe
print(europe)

{'capital': 'paris', 'population': 66.03}
{'spain': {'capital': 'madrid', 'population': 46.77}, 'france': {'capital': 'paris', 'population': 66.03}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'italy': {'capital': 'rome', 'population': 59.83}}


#### `Dictionary to DataFrame (1)`
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.

Three lists are defined in the script:

- __names__, containing the country names for which data is available.
- __dr__, a list with booleans that tells whether people drive left or right in the corresponding country.
- __cpc__, the number of motor vehicles per 1000 people in the corresponding country.

Each dictionary key is a column label and each value is a list which contains the column elements.


#### `Instructions`
- Import __pandas__ as _pd_.
- Use the pre-defined lists to create a dictionary called my_dict. There should be three key value pairs:
  - key '__country__' and value __names__.
  - key '__drives_right__' and value __dr__.
  - key '__cars_per_cap__' and value __cpc__.
- Use __pd.DataFrame()__ to turn your dict into a DataFrame called __cars__.
- Print out __cars__ and see how beautiful it is.

In [2]:
# Pre-defined lists
import pandas as pd
names = ['United States', 'Australia', 'Japan',
         'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {
    'country': names,
    'drives_right': dr,
    'cars_per_cap': cpc
}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
cars

Unnamed: 0,country,drives_right,cars_per_cap
0,United States,True,809
1,Australia,False,731
2,Japan,False,588
3,India,False,18
4,Russia,True,200
5,Morocco,True,70
6,Egypt,True,45


You learned about dictionaries in Python, focusing on their unique properties and how to manipulate them. A dictionary is a collection of key-value pairs where each key must be unique and immutable. Here are the key points you covered:

- __Accessing Values__: You can retrieve a value by using its key in square brackets. For example, world['Albania'] returns the population of Albania.
- __Adding and Updating Entries__: You can add a new key-value pair or update an existing one using the same syntax. For example, ___world['Sealand'] = 27___ adds Sealand with a population of 27.
- __Removing Entries__: Use the ___del___ statement to remove a key-value pair. For example, ___del(world['Sealand'])___ removes Sealand from the dictionary.
- __Immutability of Keys__: Keys must be immutable types like strings, integers, or tuples. Mutable types like lists cannot be used as keys.

In [3]:
# Definition of dictionary
europe = {'spain': 'madrid', 'france': 'paris',
          'germany': 'berlin', 'norway': 'oslo'}

# Add italy to europe
europe['italy'] = 'rome'

# Print out italy in europe
print('italy' in europe)  # Outputs: True

# Add poland to europe
europe['poland'] = 'warsaw'

# Print europe
print(europe)

True
{'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo', 'italy': 'rome', 'poland': 'warsaw'}


#### `Dictionary to DataFrame (2)`
The Python code that solves the previous exercise is included in the script. Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6?

To solve this a list __row_labels__ has been created. You can use it to specify the row labels of the __cars__ DataFrame. You do this by setting the __index__ attribute of __cars__, that you can access as __cars.index__.

#### `Instructions`
- Hit Run Code to see that, indeed, the row labels are not correctly set.
- Specify the row labels by setting __cars.index__ equal to __row_labels__.
- Print out __cars__ again and check if the row labels are correct this time.

In [6]:
import pandas as pd

# Build cars DataFrame
names = ['United States', 'Australia', 'Japan',
         'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = {'country': names, 'drives_right': dr, 'cars_per_cap': cpc}
cars = pd.DataFrame(cars_dict)
print(cars)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

# Print cars again
print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45
           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


In [None]:
 git add ./Machine_Learning_Scientist_with_Python/pythonDD


#### `CSV to DataFrame (1)`
Putting data in a dictionary and then building a DataFrame works, but it's not very efficient. What if you're dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for "comma-separated values".

To import CSV data into Python as a Pandas DataFrame you can use __read_csv()__.

Let's explore this function with the same cars data from the previous exercises. This time, however, the data is available in a CSV file, named __cars.csv__. It is available in your current working directory, so the path to the file is simply '__cars.csv__'.

#### `Instructions`
- To import CSV files you still need the __pandas__ package: import it as __pd__.
- Use __pd.read_csv()__ to import __cars.csv__ data as a DataFrame. Store this DataFrame as __cars__.
- Print out __cars__. Does everything look OK?

In [7]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars = pd.read_csv('./datasets/cars.csv')

# Print out cars
print(cars)

  Unnamed: 0  cars_per_cap        country  drives_right
0         US           809  United States          True
1        AUS           731      Australia         False
2        JAP           588          Japan         False
3         IN            18          India         False
4         RU           200         Russia          True
5        MOR            70        Morocco          True
6         EG            45          Egypt          True


#### `CSV to DataFrame (2)`
Your __read_csv()__ call to import the CSV data didn't generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.

Remember __index_col__, an argument of __read_csv()__, that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here!

Python code that solves the previous exercise is already included; can you make the appropriate changes to fix the data import?

#### `Instructions`
- Run the code with Run Code and assert that the first column should actually be used as row labels.
- Specify the __index_col__ argument inside pd.__read_csv()__: set it to 0, so that the first column is used as row labels.
- Has the printout of __cars__ improved now?

In [1]:
# Import pandas as pd
import pandas as pd

# Fix import by including index_col
cars = pd.read_csv('./datasets/cars.csv', index_col=0)

# Print out cars
print(cars)

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True


#### `Square Brackets (1)`
In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.

In the sample code, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the __cars_per_cap__ column from __cars__, you can use:

___cars['cars_per_cap']___

___cars[['cars_per_cap']]___

The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

#### `Instructions`
- Use single square brackets to print out the __country__ column of __cars__ as a Pandas Series.
- Use double square brackets to print out the __country__ column of __cars__ as a Pandas DataFrame.
- Use double square brackets to print out a DataFrame with both the __country__ and __drives_right__ columns of __cars__, in this order.

In [5]:
# Import cars data
import pandas as pd
cars = pd.read_csv('./datasets/cars.csv', index_col=0)

# Print out country column as Pandas Series
print(cars['country'])

# Print out country column as Pandas DataFrame
print(cars[['country']])

# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])

US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
           country
US   United States
AUS      Australia
JAP          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True


#### `Square Brackets (2)`
Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the __cars__ DataFrame:

___cars[0:5]___

The result is another DataFrame containing only the rows you specified.

Pay attention: You can only select rows using square brackets if you specify a slice, like ___0:4___. Also, you're using the integer indexes of the rows here, not the row labels!

#### `Instructions`
- Select the first ___3___ observations from __cars__ and print them out.
- Select the fourth, fifth and sixth observation, corresponding to row indexes ___3___, ___4___ and ___5___, and print them out.

In [1]:
# Import cars data
import pandas as pd
cars = pd.read_csv('./datasets/cars.csv', index_col=0)

# Print out first 3 observations
print(cars.iloc[:3])

# Print out fourth, fifth and sixth observation
print(cars.iloc[3:6])

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
     cars_per_cap  country  drives_right
IN             18    India         False
RU            200   Russia          True
MOR            70  Morocco          True


#### `loc and iloc (1)`
With ___loc___ and __iloc__ you can do practically any data selection operation on DataFrames you can think of. __loc__ is label-based, which means that you have to specify rows and columns based on their row and column labels. __iloc__ is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.

Try out the following commands to experiment with __loc__ and __iloc__ to select observations. Each pair of commands here gives the same result.

In [4]:
cars.loc['RU']
print(cars.iloc[4])
print()
cars.loc[['RU']]
print(cars.iloc[[4]])

cars.loc[['RU', 'AUS']]
cars.iloc[[4, 1]]

cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object

    cars_per_cap country  drives_right
RU           200  Russia          True


Unnamed: 0,cars_per_cap,country,drives_right
RU,200,Russia,True
AUS,731,Australia,False


- Use __loc__ or __iloc__ to select the observation corresponding to Japan as a Series. The label of this row is ___JPN___, the index is ___2___. Make sure to print the resulting Series.
- Use __loc__ or __iloc__ to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels/indexes of these rows by inspecting __cars__. Make sure to print the resulting DataFrame.

In [14]:
# Import cars data
import pandas as pd
cars = pd.read_csv('./datasets/cars.csv', index_col=0)

# Print out observation for Japan
cars.loc['JAP']

# Print out observations for Australia and Egypt
cars.loc[['JAP', 'EG']]

Unnamed: 0,cars_per_cap,country,drives_right
JAP,588,Japan,False
EG,45,Egypt,True


#### `loc and iloc (2)`
__loc__ and __iloc__ also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands. Again, paired commands produce the same result.

In [27]:
print(cars.loc['IN', 'cars_per_cap'])
print(cars.iloc[3, 0])
print()
print(cars.loc[['IN', 'RU'], 'cars_per_cap'])
print()
print(cars.iloc[[3, 4], 0])

cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
cars.iloc[[3, 4], [0, 1]]

18
18

IN     18
RU    200
Name: cars_per_cap, dtype: int64

IN     18
RU    200
Name: cars_per_cap, dtype: int64


Unnamed: 0,cars_per_cap,country
IN,18,India
RU,200,Russia


- Print out the __drives_right__ value of the row corresponding to Morocco (its row label is __MOR__)
- Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns __country__ and __drives_right__.

In [24]:
# Import cars data
import pandas as pd
cars = pd.read_csv('./datasets/cars.csv', index_col=0)

# Print out drives_right value of Morocco
print(cars.loc['MOR','drives_right'])

# Print sub-DataFrame
print(cars.loc[['MOR'], ['country', 'drives_right']])

True
     country  drives_right
MOR  Morocco          True


__.loc[]__ and __.iloc[]__ are excellent tools for selecting DataFrame values by label and index. In the next exercise, you'll select entire columns using them!

#### `loc and iloc (3)`
It's also possible to select only columns with __loc__ and __iloc__. In both cases, you simply put a slice going from beginning to end in front of the comma:

In [28]:
print(cars.loc[:, 'country'])
print(cars.iloc[:, 1])

print(cars.loc[:, ['country', 'drives_right']])
print(cars.iloc[:, [1, 2]])

US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True


- Print out the __drives_right__ column as a Series using __loc__ or __iloc__.
- Print out the __drives_right__ column as a DataFrame using __loc__ or __iloc__.
- Print out both the __cars_per_cap__ and __drives_right__ column as a DataFrame using __loc__ or __iloc__.

In [37]:
# Import cars data
import pandas as pd
cars = pd.read_csv('./datasets/cars.csv', index_col=0)

# Print out drives_right column as Series
print(cars.loc[:, 'drives_right'])

# Print out drives_right column as DataFrame
print(cars.loc[:, ['drives_right']])

# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])

US      True
AUS    False
JAP    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool
     drives_right
US           True
AUS         False
JAP         False
IN          False
RU           True
MOR          True
EG           True
     cars_per_cap  drives_right
US            809          True
AUS           731         False
JAP           588         False
IN             18         False
RU            200          True
MOR            70          True
EG             45          True


In [None]:
ch = {
  inde
}