# Working with Pandas Data Frames

In this notebook, we'll load some data as a Pandas Data Frame and do some analysis on the data.

## 1.0 Load and Clean data

We'll upload are data using `wget` and then load our data into a pandas data frame.

In [None]:
!wget https://raw.githubusercontent.com/IBM/python-and-analytics/master/data/cfpbciti.csv

We use the convention and set our data frame to the variable `df`.

In [None]:
import pandas as pd
df = pd.read_csv('cfpbciti.csv')
df.head(5)

### 1.1 Work with columns and rows
We can perform a variety of actions on the columns and rows of the data frame.
For example, if there is a column we wish to drop, we use the `df.drop()` method.
(Ignore errors for missing keys so that this is not dependent on a particuluar data set)




In [None]:
df = df.drop(columns=['Submitted via'], axis=1, errors='ignore')

We can print the first n lines using `df.head(n)`

In [None]:
df.head(5)

To access a row, we can use the Pandas method `loc` and index it by the row number

In [None]:
row_1 = df.loc[1]
row_1

### 1.2 Examine the data types and statistics of the features
When we run `df.info()` we will see the name of the feature (column), the number of entries (rows), whether there are entries (null or non-null), and the data type (string, object, float64, etc)

In [None]:
df.info()

We can run `describe()` to get statistics for the columns (features).
Set the `include` parameter to `object`, since default is to describe just the numeric features.
Note that our results will show `NaN` for statistics that are not applicable for our object data if we change to `include = 'all`).

In [None]:
df.describe(include = 'object')