# Pandas

A guide on using the pandas API

In [1]:
# typical import statement
import pandas as pd

## Reading Data

[Input/Output API](https://pandas.pydata.org/pandas-docs/stable/reference/io.html#input-output)

Data loading methods return a [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#dataframe)  
There are a lot of supported formats: json, xml, html, etcetera. (refer to the API manual)

In [16]:
filename = './data_sample1.csv'
dataframe = pd.read_csv(filename)

## Getting to Know the Data

These methods are generally used to review or analyze the data.

### head

By default, display the first 5 rows.  
It accepts a single integer to change the number of displayed rows.

In [14]:
dataframe.head()

Unnamed: 0,Name,Age,Location
0,Caesar,28,Israel
1,Mario,18,Canada
2,Pepe,42,Wonderland
3,Neo,30,The Matrix


In [5]:
dataframe.head(2)

Unnamed: 0,Name,Age,Location
0,Caesar,28,Israel
1,Mario,18,Canada


### describe

Displays various statistics about the data: extremums, average, standard deviation, median or percentile values (25% / 50% / 75%).
The median is the 50th percentile.  
Count is the number of rows in the dataframe.

In [6]:
dataframe.describe()

Unnamed: 0,Age
count,4.0
mean,29.5
std,9.848858
min,18.0
25%,25.5
50%,29.0
75%,33.0
max,42.0


### List of Availablle Data

To see which data is collected in the dataframe, use the columns **attribute** (not a method) to see the list of the column names.

In [17]:
dataframe.columns

Index(['Name', 'Age', 'Location'], dtype='object')

In [18]:
list(dataframe.columns) # for cleaner output

['Name', 'Age', 'Location']

When working with dataframes, you'll need to select the **predicted** variable, and the **features**  

Features are the columns that are used for predicting the variable's value.

### Data Series

A DataSeries is like a DataFrame, except it contains a single column

In [20]:
dataframe['Age']

0    28
1    18
2    42
3    30
Name: Age, dtype: int64

In [23]:
type(dataframe['Age'])

pandas.core.series.Series

### Dataframes with a Subset of Features

In [24]:
array_of_features = ['Name', 'Age']
dataframe[array_of_features] # an array of labels can be passed

Unnamed: 0,Name,Age
0,Caesar,28
1,Mario,18
2,Pepe,42
3,Neo,30


In [22]:
type(dataframe[array_of_features])

pandas.core.frame.DataFrame