# 1) CSV to DataFrame

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data, where you can label the rows and the columns.

In the exercises that follow, you will be working wit vehicle data in different countries. Each observation corresponds to a country, and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on. This data is available in a CSV file, named cars.csv. It is available in your current working directory, so the path to the file is simply 'cars.csv'.

To import CSV data into Python as a Pandas DataFrame, you can use read_csv().

** Instructions **
- To import CSV files, you still need the pandas package: import it as pd.
- Use pd.read_csv() to import cars.csv data as a DataFrame. Store this dataframe as cars.
- Print out cars. Does everything look OK?

In [5]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars = pd.read_csv("cars.csv")

# Print out cars
cars.head()

Unnamed: 0.1,Unnamed: 0,cars_per_cap,country,drives_right
0,US,809,United States,True
1,AUS,731,Australia,False
2,JAP,588,Japan,False
3,IN,18,India,False
4,RU,200,Russia,True


Your read_csv() call to import the CSV data didn't generate an error, but the output is not entirely what you'd want: the row labels are imported as another column, that has no name.

Remember index_col, an argument of read_csv() that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here!

Python code that solves the previous exercise is already included; can you make the appropriate changes to fix the data import?

** Instructions **
- Run the code with Submit Answer and assert that the first column should actually be used as row labels.
- Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels.
- Has the printout of cars improved now?

In [14]:
# Import pandas as pd
import pandas as pd

# Fix import by including index_col
cars = pd.read_csv('cars.csv', index_col=0)

# Print out cars
cars.head(10)

Unnamed: 0,cars_per_cap,country,drives_right
US,809,United States,True
AUS,731,Australia,False
JAP,588,Japan,False
IN,18,India,False
RU,200,Russia,True
MOR,70,Morocco,True
EG,45,Egypt,True


# 2) Square Brackets

In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but the not the most powerful way, is to use square brackets.

In the sample code on the right, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:

```python
cars['cars_per_cap']
cars[['cars_per_cap']]
```
The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

** Instructions **
- Use single square brackets to print out the country column of cars as a Pandas Series.
- Use double square brackets to print out the country column of cars as a Pandas DataFrame. Do this by putting cars_per_cap in two square brackets this time.

In [7]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars['country'])

# Print out country column as Pandas DataFrame
print(cars[['country']])

US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
           country
US   United States
AUS      Australia
JAP          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt


# 3) loc

With loc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels.

Try out the following commands in the IPython Shell to experiment with loc to select observations:

In [9]:
cars.loc['RU']

cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object

In [10]:
cars.loc[['RU']]

Unnamed: 0,cars_per_cap,country,drives_right
RU,200,Russia,True


In [11]:
cars.loc[['RU', 'AUS']]

Unnamed: 0,cars_per_cap,country,drives_right
RU,200,Russia,True
AUS,731,Australia,False


As before, code is included that imports the cars data as a Pandas DataFrame.

** Instructions **
- Use ```loc``` to select the observation corresponding to Japan as a Series. The label of this row is JAP. Make sure to print the resulting Series.
- Use ```loc``` to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels of these rows by inspecting cars in the IPython Shell. Make sure to print the resulting DataFrame.

In [15]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out observation for Japan
print(cars.loc['JAP'])

# Print out observations for Australia and Egypt
print(cars.loc[['AUS','EG']])

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
     cars_per_cap    country drives_right
AUS           731  Australia        False
EG             45      Egypt         True


```loc``` also allows you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell.

In [16]:
cars.loc['IN', 'cars_per_cap']

18

In [17]:
cars.loc[['IN', 'RU'], 'cars_per_cap']

IN     18
RU    200
Name: cars_per_cap, dtype: int64

In [18]:
cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]

Unnamed: 0,cars_per_cap,country
IN,18,India
RU,200,Russia


** Instructions **
- Print out the drives_right value of the row corresponding to Morocco (its row label is MOR)
- Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns country and drives_right.

In [19]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out drive_right value of Morocco
print(cars.loc['MOR', 'drives_right'])

# Print sub-DataFrame
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])

True
     country drives_right
RU    Russia         True
MOR  Morocco         True
