# Intro to Pandas

Pandas is a Python package for data analysis and exposes two new
data structures: Dataframes and Series.

- [Dataframes](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) store tabular data consisting of rows and columns.
- [Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) are similar to Python's built-in list or set data types.

In this notebook, we will explore the data structures that Pandas
provides, and learn how to interact with them.

### 1. Importing Pandas

To import an external Python library such as Pandas, use Python's
import function. To save yourself some typing later on, you can
give the library you import an alias. Here, we are importing Pandas
and giving it an alias of `pd`.

In [1]:
import pandas as pd

### 2. Creating A Dataframe and Basic Exploration
We will load a CSV file as a dataframe using Panda's `read_csv`
method. This will allow us to use Pandas' dataframe functions to
explore the data in the CSV.

In [2]:
df = pd.read_csv("../../data/loans_full.zip",index_col=0)

  interactivity=interactivity, compiler=compiler, result=result)


Once we have loaded the CSV as a dataframe, we can start to explore
the data.  Here are a few useful methods:
    - .head(): returns first 5 rows of the DataFrame
    - .tail(): returns last 5 rows of the DataFrame
    - .shape: returns tuple with first element indicating the number of rows and the second element indicating the number of columns
    - .columns: returns list of all columns in DataFrame
    - .index: returns DataFrame indices
    - .dtypes: returns Series explaining the datatype of each column

In [3]:
df.dtypes

activity                          object
basket_amount                    float64
bonus_credit_eligibility            bool
borrower_count                     int64
currency_exchange_loss_amount    float64
description.languages             object
funded_amount                      int64
id                                 int64
image.id                           int64
image.template_id                  int64
lender_count                       int64
loan_amount                        int64
location.country                  object
location.country_code             object
location.geo.level                object
location.geo.pairs                object
location.geo.type                 object
location.town                     object
name                              object
partner_id                       float64
planned_expiration_date           object
posted_date                       object
sector                            object
status                            object
tags            

To get some basic stats of the columns you can either use .describe() for discrete data or .value_counts for categroical data

In [4]:
df.describe()

Unnamed: 0,basket_amount,borrower_count,currency_exchange_loss_amount,funded_amount,id,image.id,image.template_id,lender_count,loan_amount,partner_id,video.id,video.thumbnailImageId
count,944.0,127900.0,24790.0,127900.0,127900.0,127900.0,127900.0,127900.0,127900.0,118262.0,76.0,76.0
mean,0.185381,1.860868,5.73573,452.238702,737753.9,1590475.0,1.0,14.370985,472.349883,165.03559,1291.697368,615530.5
std,2.145937,2.925932,12.991248,655.775669,342772.4,606689.1,0.0,19.631199,682.596785,66.0073,1027.999559,462245.0
min,0.0,1.0,0.01,0.0,251.0,409.0,1.0,0.0,25.0,6.0,150.0,297574.0
25%,0.0,1.0,0.95,225.0,441967.8,1074678.0,1.0,7.0,250.0,133.0,487.75,324494.8
50%,0.0,1.0,2.56,350.0,764079.5,1683744.0,1.0,11.0,350.0,156.0,665.5,336108.0
75%,0.0,1.0,6.54,575.0,1064133.0,2164059.0,1.0,18.0,600.0,164.0,2154.25,624790.0
max,25.0,46.0,1285.51,50000.0,1292273.0,2516905.0,1.0,1589.0,50000.0,526.0,3008.0,1754457.0


In [5]:
df['activity'].value_counts()

Farming                           26237
Dairy                              7524
General Store                      6615
Agriculture                        6402
Retail                             5346
Fruits & Vegetables                5222
Clothing Sales                     5192
Grocery Store                      4625
Poultry                            4103
Cereals                            3679
Tailoring                          2910
Motorcycle Transport               2880
Food Stall                         2730
Services                           2694
Clothing                           2322
Charcoal Sales                     2086
Beauty Salon                       2085
Fish Selling                       1778
Food Production/Sales              1699
Food                               1609
Used Clothing                      1526
Home Energy                        1449
Food Market                        1271
Livestock                          1093
Cosmetics Sales                    1014


### 3. Selecting Data - Part 1
To examine a specfic column of the DataFrame:

In [6]:
df['activity'].head()

0             Farming
1    Furniture Making
2         Home Energy
3       Used Clothing
4             Farming
Name: activity, dtype: object

In [7]:
df[['activity','basket_amount']].tail()

Unnamed: 0,activity,basket_amount
127895,Clothing Sales,
127896,Personal Housing Expenses,
127897,General Store,
127898,Clothing Sales,
127899,Food Production/Sales,




To examine specific rows and columns of a Dataframe, Pandas provides
the `iloc` and `loc` methods to do so.  `iloc` is used when you want to specify a list or range of indices, and `.loc` is used when you want to specify a list or range of labels.  

For both of these methods you need to specify two elements, with the first element indicating the rows that you want to select and the second element indicating the columns that you want to select.

In [8]:
# Get rows 1 through 3 and columns 0 through 5.
df.iloc[1:3,:5]

Unnamed: 0,activity,basket_amount,bonus_credit_eligibility,borrower_count,currency_exchange_loss_amount
1,Furniture Making,0.0,False,1,
2,Home Energy,0.0,False,1,


In [9]:
# Get rows with index values of 2-4 and the columns basket_amount and activity
df.loc[2:4, ["basket_amount", "activity"]]

Unnamed: 0,basket_amount,activity
2,0.0,Home Energy
3,0.0,Used Clothing
4,0.0,Farming


In [10]:
# To see all the rows and columns:
df.iloc[:,:]

Unnamed: 0,activity,basket_amount,bonus_credit_eligibility,borrower_count,currency_exchange_loss_amount,description.languages,funded_amount,id,image.id,image.template_id,...,posted_date,sector,status,tags,themes,use,video.id,video.thumbnailImageId,video.title,video.youtubeId
0,Farming,0.0,False,1,,['en'],0,1291548,2516002,1,...,2017-05-09T00:40:03Z,Agriculture,fundraising,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",,to purchase more tea leaves to sell to the tea...,,,,
1,Furniture Making,0.0,False,1,,['en'],0,1291532,2515992,1,...,2017-05-09T00:30:05Z,Manufacturing,fundraising,[],,to buy timber to make more furniture for his e...,,,,
2,Home Energy,0.0,False,1,,['en'],50,1291530,2515991,1,...,2017-05-09T00:30:04Z,Personal Use,fundraising,"[{'name': '#Eco-friendly'}, {'name': '#Technol...","['Green', 'Earth Day Campaign']",to buy a solar lantern.,,,,
3,Used Clothing,0.0,False,1,,['en'],0,1291525,2515986,1,...,2017-05-09T00:20:04Z,Clothing,fundraising,[{'name': '#Eco-friendly'}],,to buy more clothes to meet the needs and tast...,,,,
4,Farming,0.0,False,1,,['en'],0,1291518,2515975,1,...,2017-05-09T00:20:03Z,Agriculture,fundraising,[{'name': '#Woman Owned Biz'}],['Rural Exclusion'],"to buy farming inputs (fertilizers, pesticides...",,,,
5,Used Clothing,0.0,False,1,,['en'],0,1291513,2515968,1,...,2017-05-09T00:10:04Z,Clothing,fundraising,"[{'name': '#Woman Owned Biz'}, {'name': '#Eco-...",,to buy more bales of clothes to grow her busin...,,,,
6,Farming,25.0,False,1,,['en'],125,1291516,2515972,1,...,2017-05-09T00:10:03Z,Agriculture,fundraising,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",['Rural Exclusion'],to buy seeds so that she can begin horticultur...,,,,
7,Pigs,0.0,False,1,,['en'],0,1291490,2515937,1,...,2017-05-08T23:30:09Z,Agriculture,fundraising,[{'name': '#Animals'}],,"to buy pig feeds and logs to burn charcoal, so...",,,,
8,Farming,0.0,False,1,,['en'],0,1291494,2511365,1,...,2017-05-08T23:30:05Z,Agriculture,fundraising,[],,to purchase farm inputs.,,,,
9,Cereals,0.0,False,1,,['en'],0,1291486,2515930,1,...,2017-05-08T23:20:06Z,Food,fundraising,[{'name': '#Woman Owned Biz'}],['Rural Exclusion'],to buy cereals to sell at her local market.,,,,


In [11]:
# You can also store a slice of the dataframe as a new dataframe!
titles_df = df.iloc[:,2]
titles_df.head()

0    False
1    False
2    False
3    False
4    False
Name: bonus_credit_eligibility, dtype: bool

### 4. Select subets of the DataFrame

A powerful feature of DataFrames is that you can view a subset of the DataFrame based on the values of the columns or rows.  For example, lets say you only wanted to view loans with a status of "expired"

In [12]:
df[df['status']=='expired']

Unnamed: 0,activity,basket_amount,bonus_credit_eligibility,borrower_count,currency_exchange_loss_amount,description.languages,funded_amount,id,image.id,image.template_id,...,posted_date,sector,status,tags,themes,use,video.id,video.thumbnailImageId,video.title,video.youtubeId
2399,Agriculture,,False,1,,['en'],450,1269956,2487120,1,...,2017-04-05T22:30:06Z,Agriculture,expired,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",['Rural Exclusion'],"to buy high quality fertilizers, pesticides, a...",,,,
2410,Motorcycle Transport,,False,1,,['en'],450,1269436,2486319,1,...,2017-04-05T16:00:04Z,Transportation,expired,"[{'name': 'user_favorite'}, {'name': '#Job Cre...",,to buy another motorbike to grow his fleet.,,,,
2418,Farming,,False,1,,['en'],325,1269478,2486409,1,...,2017-04-05T14:00:03Z,Agriculture,expired,[{'name': '#Woman Owned Biz'}],['Rural Exclusion'],"to buy fertilizers, pesticides, and herbicides.",,,,
2426,Farming,,False,11,,['en'],325,1269447,2412655,1,...,2017-04-05T12:40:04Z,Agriculture,expired,"[{'name': 'user_favorite'}, {'name': '#Sustain...","['Green', 'Rural Exclusion', 'Earth Day Campai...",to buy cost-efficient maize seeds and fertiliz...,,,,
2429,General Store,,False,1,,['en'],475,1269432,2486356,1,...,2017-04-05T12:20:05Z,Retail,expired,"[{'name': '#Parent'}, {'name': '#Repeat Borrow...",,to add stock of confectioneries and soft drink...,,,,
2442,General Store,,False,1,,['en'],450,1269319,2480311,1,...,2017-04-05T01:00:03Z,Retail,expired,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",,to purchase more sodas to sell.,,,,
2469,Dairy,,False,1,,['en'],675,1269201,2486118,1,...,2017-04-04T20:40:06Z,Agriculture,expired,"[{'name': '#Woman Owned Biz'}, {'name': '#Anim...",['Rural Exclusion'],to buy animal feeds.,,,,
2965,Cereals,,False,1,,['en'],250,1262767,2477426,1,...,2017-03-28T04:40:05Z,Food,expired,[{'name': '#Parent'}],['Rural Exclusion'],the borrower buy cereals and resell them at he...,,,,
2967,Shoe Sales,,True,1,,['en'],75,1263710,2478596,1,...,2017-03-28T04:40:02Z,Retail,expired,"[{'name': 'user_favorite'}, {'name': '#Parent'...",,to buy more stock of shoes.,,,,
2969,Retail,,True,1,,['en'],325,1264027,2478922,1,...,2017-03-28T04:00:05Z,Retail,expired,"[{'name': 'user_favorite'}, {'name': '#Parent'...",,to buy more stocks of clothes.,,,,


To view all loans with a status of "expired" `or` "fundraising":

In [13]:
df[(df['status']=='expired')|(df['status']=='fundraising')]

Unnamed: 0,activity,basket_amount,bonus_credit_eligibility,borrower_count,currency_exchange_loss_amount,description.languages,funded_amount,id,image.id,image.template_id,...,posted_date,sector,status,tags,themes,use,video.id,video.thumbnailImageId,video.title,video.youtubeId
0,Farming,0.0,False,1,,['en'],0,1291548,2516002,1,...,2017-05-09T00:40:03Z,Agriculture,fundraising,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",,to purchase more tea leaves to sell to the tea...,,,,
1,Furniture Making,0.0,False,1,,['en'],0,1291532,2515992,1,...,2017-05-09T00:30:05Z,Manufacturing,fundraising,[],,to buy timber to make more furniture for his e...,,,,
2,Home Energy,0.0,False,1,,['en'],50,1291530,2515991,1,...,2017-05-09T00:30:04Z,Personal Use,fundraising,"[{'name': '#Eco-friendly'}, {'name': '#Technol...","['Green', 'Earth Day Campaign']",to buy a solar lantern.,,,,
3,Used Clothing,0.0,False,1,,['en'],0,1291525,2515986,1,...,2017-05-09T00:20:04Z,Clothing,fundraising,[{'name': '#Eco-friendly'}],,to buy more clothes to meet the needs and tast...,,,,
4,Farming,0.0,False,1,,['en'],0,1291518,2515975,1,...,2017-05-09T00:20:03Z,Agriculture,fundraising,[{'name': '#Woman Owned Biz'}],['Rural Exclusion'],"to buy farming inputs (fertilizers, pesticides...",,,,
5,Used Clothing,0.0,False,1,,['en'],0,1291513,2515968,1,...,2017-05-09T00:10:04Z,Clothing,fundraising,"[{'name': '#Woman Owned Biz'}, {'name': '#Eco-...",,to buy more bales of clothes to grow her busin...,,,,
6,Farming,25.0,False,1,,['en'],125,1291516,2515972,1,...,2017-05-09T00:10:03Z,Agriculture,fundraising,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",['Rural Exclusion'],to buy seeds so that she can begin horticultur...,,,,
7,Pigs,0.0,False,1,,['en'],0,1291490,2515937,1,...,2017-05-08T23:30:09Z,Agriculture,fundraising,[{'name': '#Animals'}],,"to buy pig feeds and logs to burn charcoal, so...",,,,
8,Farming,0.0,False,1,,['en'],0,1291494,2511365,1,...,2017-05-08T23:30:05Z,Agriculture,fundraising,[],,to purchase farm inputs.,,,,
9,Cereals,0.0,False,1,,['en'],0,1291486,2515930,1,...,2017-05-08T23:20:06Z,Food,fundraising,[{'name': '#Woman Owned Biz'}],['Rural Exclusion'],to buy cereals to sell at her local market.,,,,


Select loans that have expired and with loan amounts greater than 1000

In [14]:
df[(df['status']=='expired')&(df['loan_amount']>1000)]

Unnamed: 0,activity,basket_amount,bonus_credit_eligibility,borrower_count,currency_exchange_loss_amount,description.languages,funded_amount,id,image.id,image.template_id,...,posted_date,sector,status,tags,themes,use,video.id,video.thumbnailImageId,video.title,video.youtubeId
2985,Restaurant,,True,1,,['en'],200,1263751,2478640,1,...,2017-03-27T21:20:03Z,Food,expired,"[{'name': '#Parent'}, {'name': '#Repeat Borrow...",,to buy stocks of ingredients for preparing foo...,,,,
3063,Beauty Salon,,True,1,,['en'],550,1263522,2478356,1,...,2017-03-27T10:20:02Z,Services,expired,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",,to buy more beauty products for salon use.,,,,
3403,General Store,,False,1,,['en'],250,1260095,2473838,1,...,2017-03-22T00:00:03Z,Retail,expired,"[{'name': '#Woman Owned Biz'}, {'name': '#Pare...",['Rural Exclusion'],"to buy more stock of sugar, bread, flour, soap...",,,,
3763,Farming,,False,16,,['en'],450,1256812,2391328,1,...,2017-03-16T23:40:05Z,Agriculture,expired,"[{'name': 'user_favorite'}, {'name': '#Sustain...","['Green', 'Rural Exclusion', 'Earth Day Campai...",to purchase a solar light and gain access to c...,,,,
3843,Farming,,False,10,,['en'],550,1255670,2392490,1,...,2017-03-15T04:10:02Z,Agriculture,expired,"[{'name': 'user_favorite'}, {'name': '#Sustain...","['Green', 'Rural Exclusion', 'Earth Day Campai...",to buy cost-efficient maize seeds and fertiliz...,,,,
3884,Farming,,False,14,,['en'],825,1254560,2392391,1,...,2017-03-14T14:10:03Z,Agriculture,expired,"[{'name': 'user_favorite'}, {'name': '#Sustain...","['Green', 'Rural Exclusion', 'Earth Day Campai...",to buy cost-efficient maize seeds and fertiliz...,,,,
3892,Farming,,False,13,,['en'],800,1254533,2388816,1,...,2017-03-14T13:00:02Z,Agriculture,expired,"[{'name': 'user_favorite'}, {'name': '#Sustain...","['Green', 'Rural Exclusion', 'Earth Day Campai...",to buy cost-efficient maize seeds and fertiliz...,,,,
3995,Retail,,True,1,,['en'],2475,1253738,2465429,1,...,2017-03-13T04:40:02Z,Retail,expired,"[{'name': 'user_favorite'}, {'name': '#Woman O...",['Rural Exclusion'],to buy more stocks of fruits and vegetables.,,,,
4029,Motorcycle Transport,,False,1,,['en'],75,1253737,2457800,1,...,2017-03-13T00:50:02Z,Transportation,expired,"[{'name': 'user_favorite'}, {'name': '#Parent'...",,to purchase spare parts and service his motorb...,,,,
4038,Motorcycle Transport,,True,1,,['en'],500,1253697,2465386,1,...,2017-03-13T00:30:03Z,Transportation,expired,"[{'name': 'user_favorite'}, {'name': '#Parent'...",,to service and maintain his motorcycle.,,,,


## Great Resources for further information:

- [10 minute introduction to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html)
- [Pandas in ipython notebooks](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/master/cookbook/A%20quick%20tour%20of%20IPython%20Notebook.ipynb)

In [15]:
!ls 

Intro to Pandas.ipynb intro_to_python.ipynb
data.csv              loan_details.csv
