# CSV to DataFrame

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data, where you can label the rows and the columns.

In the exercises that follow, you will be working with vehicle data in different countries. Each observation corresponds to a country, and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on. This data is available in a CSV file, named cars.csv. It is available in your current working directory, so the path to the file is simply 'cars.csv'.

To import CSV data into Python as a Pandas DataFrame, you can use read_csv().

In [None]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars = pd.read_csv("cars.csv")

# Print out cars
print(cars)

Your read_csv() call to import the CSV data didn't generate an error, but the output is not entirely what you'd want: the row labels are imported as another column, that has no name.

Remember index_col, an argument of read_csv() that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here!

Python code that solves the previous exercise is already included; can you make the appropriate changes to fix the data import?

In [None]:
# Import pandas as pd
import pandas as pd

# Fix import by including index_col
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out cars
print(cars)

# Square Brackets

In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.

In the sample code on the right, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:

    cars['cars_per_cap']
    cars[['cars_per_cap']] 

The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

In [None]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars["Type"])

# Print out country column as Pandas DataFrame
print(cars[["Type"]])

# loc

With loc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels.

Try out the following commands in the IPython Shell to experiment with loc to select observations:

    cars.loc['RU']
    cars.loc[['RU']]
    cars.loc[['RU', 'AUS']]

As before, code is included that imports the cars data as a Pandas DataFrame.

In [2]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out observation for 
cars.index = cars["Make"]
print(cars.loc["Audi"])

# Print out observations for Australia and Egypt
print(cars.loc[['Audi', 'Toyota']])

      Make                              Model    Type  Origin DriveTrain  \
Make                                                                       
Audi  Audi                        A4 1.8T 4dr   Sedan  Europe      Front   
Audi  Audi             A41.8T convertible 2dr   Sedan  Europe      Front   
Audi  Audi                         A4 3.0 4dr   Sedan  Europe      Front   
Audi  Audi          A4 3.0 Quattro 4dr manual   Sedan  Europe        All   
Audi  Audi            A4 3.0 Quattro 4dr auto   Sedan  Europe        All   
Audi  Audi                         A6 3.0 4dr   Sedan  Europe      Front   
Audi  Audi                 A6 3.0 Quattro 4dr   Sedan  Europe        All   
Audi  Audi             A4 3.0 convertible 2dr   Sedan  Europe      Front   
Audi  Audi     A4 3.0 Quattro convertible 2dr   Sedan  Europe        All   
Audi  Audi           A6 2.7 Turbo Quattro 4dr   Sedan  Europe        All   
Audi  Audi                 A6 4.2 Quattro 4dr   Sedan  Europe        All   
Audi  Audi  

loc also allows you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell.

    cars.loc['IN', 'cars_per_cap']
    cars.loc[['IN', 'RU'], 'cars_per_cap']
    cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]


In [None]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out drives_right value of Morocco
cars.index = cars["Make"]
#print(cars.loc["Audi"])

# Print sub-DataFrame
print(cars.loc[['Audi', 'Toyota'], ['Origin', 'MSRP']])