# Pandas data frames

In this notebook, we will start learning about Pandas data frames. 

To import the pandas library to our notebook, if you haven't done so already you will first need to download and install the pandas library (`pip install pandas`).


Then you can import the pandas library into this notebook as follows:

In [None]:
# import the pandas library and alias as pd


### Loading a data file into a pandas DataFrame

To load a .csv data file into our space, we need to use the `read_csv()` function from the pandas library. Make sure that you have saved the `gapminder.csv` file in a `data` subfolder that lives in the same place where this notebook is saved.

Let's load the gapminder dataset:

In [None]:
# read the csv file living in data/gapminder.csv into a pandas dataframe 


This just prints out the gapminder dataset, but it doesn't save it. 

To save the dataset so that we can use it in our notebook, we can to assign the results of the `pd.read_csv()` function to a variable called `gapminder`:

In [None]:
# Save the above dataframe as a variable called gapminder


To view the dataset we have just loaded, we can type the name of the variable that we saved it in:

In [None]:
# look at the gapminder dataframe object


We can then use the same `type()` function that we used in the previous notebook to ask what kind of object the `gapminder` variable is (the answer is a pandas DataFrame):

In [None]:
# check the type of the dataframe object


## Extracting information/attributes from DataFrames

In this section, we will learn how to extract attributes from DataFrame objects and how to apply DataFrame-specific "methods" to DataFrames, both using the `.` syntax.

As a reminder, let's print out the `gapminder` DataFrame that we're working with:

In [None]:
# look at gapminder again


### The shape attribute

To extract an attribute from an object in Python, we use the `object.attribute` syntax. So if we want to extract the `shape` attribute from the `gapminder` DataFrame object, we can do so as follows:

In [None]:
# extract the shape attribute from gapminder


This `shape` attribute tells us the number of rows (1704) and the number of columns (6) and is helpful for learning about the size of our data objects

### The head() method

The `head()` function typically prints out the first few rows of a DataFrame. However, `head()` is not a regular function. If `head()` were a regular function, we would be able to apply it like this:

In [None]:
# try to look at the first 5 rows of the gapminder dataset using the head() function


But this results in an error. This is because `head()` is not a function that can be applied in the regular way. 

Instead, `head()` is a **method**, which is applied using the `object.method()` syntax rather than the `method(object)` syntax above. 

We can apply the `head()` method to the `gapminder` dataset as follows, which will print out the first 5 rows of the DataFrame:

In [None]:
# apply the the head() method to gapminder


### Arguments

You can provide additional arguments to the `head()` inside the parentheses. For example, if you want to print 10 rows instead of 5, you can do so as follows:

In [None]:
# apply the head() method to gapminder with an argument of 10


This argument has a name `n`, which you can explicitly specify:

In [None]:
# apply the head method to gapminder with a *named* argument of n=10


But you don't need to specify the `n=` part of the argument because the `head()` method knows that the first argument is the number of rows to print.

### Exercise

1. The pandas DataFrame has an attribute called `dtypes` that will print out the *type* of each column. Extract the `dtypes` attribute from the `gapminder` DataFrame:

In [None]:
# extract the dtypes attribute from gapminder
gapminder.dtypes

Note that the "string" type is called `object` in pandas.

2. The pandas DataFrame has a "method" called `select_dtypes()` that will extract just the columns of a certain type from the DataFrame. Use the `select_dtypes()` function to extract the numeric (float and integer) columns of gapminder by providing an argument `include='number'` inside the parentheses of `select_dtypes()`. 

In [None]:
# use the select_dtypes() method to select only the columns of type number
gapminder.select_dtypes(include='number')

In [None]:
# to instead extract the string/object-type columns:
gapminder.select_dtypes(include='object')