# Optimising Loans

In this project, I'll be practice working with chunked dataframes and optimizing a dataframe's memory usage. 

I'll be working with personal loan data.

The Lending Club's website lists approved loans. Qualified investors can view the borrower's credit score, the purpose of the loan, and other details in the loan applications. Once a lender is ready to back a loan, it selects the amount of money it wants to fund. When the loan amount the borrower requested is fully funded, the borrower receives the money, minus the origination fee that Lending Club charges.

I'll be working with a dataset of loans approved from 2007-2011.

The entire dataset consumes about 67 megabytes of memory. 

For this project, I'll be imagining that I only have 10 megabytes of memory available.

Let's get started...

In [19]:
import pandas as pd

Note here we can use the `nrows` parameter to only load in N amount of rows.

In [2]:
loans_5 = pd.read_csv("loans_2007.csv", nrows = 5)
loans_5

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,last_pymnt_amnt,last_credit_pull_d,collections_12_mths_ex_med,policy_code,application_type,acc_now_delinq,chargeoff_within_12_mths,delinq_amnt,pub_rec_bankruptcies,tax_liens
0,1077501,1296599.0,5000.0,5000.0,4975.0,36 months,10.65%,162.87,B,B2,...,171.62,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
1,1077430,1314167.0,2500.0,2500.0,2500.0,60 months,15.27%,59.83,C,C4,...,119.66,Sep-2013,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
2,1077175,1313524.0,2400.0,2400.0,2400.0,36 months,15.96%,84.33,C,C5,...,649.91,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
3,1076863,1277178.0,10000.0,10000.0,10000.0,36 months,13.49%,339.31,C,C1,...,357.48,Apr-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0
4,1075358,1311748.0,3000.0,3000.0,3000.0,60 months,12.69%,67.79,B,B5,...,67.79,Jun-2016,0.0,1.0,INDIVIDUAL,0.0,0.0,0.0,0.0,0.0


We'll now load in 1000 rows, and ensure that our file size is under 5MB. We'll use the `memory_usage` method.

In [18]:
loans_1000 = pd.read_csv("loans_2007.csv", nrows = 1000)
loans_1000.memory_usage(deep = True).sum() / (1024 * 1024) # Convert to MB by dividing by 1024 * 1024

1.5502548217773438

### Exploring Data in Chunk

Now let's try understand the columns better by using chunks.

For each of these chunks, we want to know:

1. How many columns have a numeric type? How many have a string type?

1. How many unique values are there in each string column? How many of the string columns contain values that are less than 50% unique?

1. Which float columns have no missing values and could be candidates for conversion to the integer type?

We also want to calculate the total memory usage across all of the chunks.

In [25]:
chunks_iter = pd.read_csv("loans_2007.csv", chunksize = 1000)

In [26]:
for chunk in chunks_iter:
    print(chunk)

          id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
0    1077501  1296599.0     5000.0       5000.0      4975.000000   36 months   
1    1077430  1314167.0     2500.0       2500.0      2500.000000   60 months   
2    1077175  1313524.0     2400.0       2400.0      2400.000000   36 months   
3    1076863  1277178.0    10000.0      10000.0     10000.000000   36 months   
4    1075358  1311748.0     3000.0       3000.0      3000.000000   60 months   
..       ...        ...        ...          ...              ...         ...   
995  1057629  1289394.0     2425.0       2425.0      2425.000000   36 months   
996  1057621  1289385.0     6950.0       6950.0      6950.000000   36 months   
997  1057787  1289153.0    12375.0      12375.0     12344.464785   36 months   
998  1057770  1289135.0    35000.0      35000.0     33906.194198   60 months   
999  1057275  1288835.0    14000.0      14000.0     14000.000000   60 months   

    int_rate  installment grade sub_gra

          id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
7000  874534  1089020.0    16600.0      16600.0     16186.384736   60 months   
7001  889260  1105837.0     9000.0       9000.0      8975.000000   36 months   
7002  889368  1105869.0     9600.0       9600.0      9600.000000   36 months   
7003  889342  1065047.0    35000.0      35000.0     33225.000000   60 months   
7004  889324  1105854.0     5375.0       5375.0      5375.000000   36 months   
...      ...        ...        ...          ...              ...         ...   
7995  872959  1087242.0     7650.0       7650.0      7650.000000   36 months   
7996  872990  1087210.0    13150.0       8800.0      7775.000000   60 months   
7997  866711  1080269.0     8500.0       8500.0      8500.000000   36 months   
7998  872953  1087235.0    14400.0      14400.0     14400.000000   60 months   
7999  872939  1087219.0     2200.0       2200.0      2200.000000   36 months   

     int_rate  installment grade sub_gr

           id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
12000  805706  1011777.0    10000.0      10000.0          10000.0   36 months   
12001  805692  1011761.0    14000.0      14000.0          13750.0   36 months   
12002  805663  1011728.0    35000.0      35000.0          34975.0   60 months   
12003  805646  1011709.0     4800.0       4800.0           4800.0   36 months   
12004  805605  1011667.0    15000.0      15000.0          14975.0   36 months   
...       ...        ...        ...          ...              ...         ...   
12995  789249   993110.0    10000.0      10000.0          10000.0   36 months   
12996  787054   990553.0     1000.0       1000.0           1000.0   60 months   
12997  789237   993096.0    12000.0      12000.0          12000.0   60 months   
12998  789233   993091.0    14500.0      14500.0          14500.0   36 months   
12999  789194   993047.0     7000.0       7000.0           6925.0   36 months   

      int_rate  installment

           id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
17000  723735   918759.0     6000.0       6000.0           6000.0   60 months   
17001  723675   918689.0     2400.0       2400.0           2400.0   36 months   
17002  720542   915010.0     1000.0       1000.0           1000.0   36 months   
17003  723702   918717.0    10000.0      10000.0          10000.0   60 months   
17004  718570   912807.0     4000.0       4000.0           4000.0   36 months   
...       ...        ...        ...          ...              ...         ...   
17995  596564   765791.0    18000.0      18000.0          18000.0   36 months   
17996  707856   900252.0     4500.0       4500.0           4500.0   36 months   
17997  706497   898712.0     7350.0       7350.0           7350.0   60 months   
17998  707827   900219.0    12000.0      12000.0          11950.0   36 months   
17999  703627   895650.0    19000.0      19000.0          18975.0   60 months   

      int_rate  installment

           id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
24000  605439   776716.0    12000.0       8225.0      7983.700527   36 months   
24001  606429   777950.0     1200.0       1200.0      1200.000000   36 months   
24002  606401   777912.0    12000.0       7575.0      7467.845225   36 months   
24003  606324   777817.0    20000.0      20000.0     20000.000000   36 months   
24004  606379   777887.0    10000.0      10000.0      9975.000000   36 months   
...       ...        ...        ...          ...              ...         ...   
24995  593152   761720.0    25000.0      25000.0     24882.286130   36 months   
24996  592018   760371.0     7500.0       7500.0      7450.000000   60 months   
24997  593147   761715.0     3600.0       3600.0      3600.000000   36 months   
24998  585693   752485.0    10000.0      10000.0      9875.000000   60 months   
24999  593121   761684.0    15000.0      15000.0     14975.000000   60 months   

      int_rate  installment

           id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
30000  516349   667371.0    14400.0      14400.0          14400.0   60 months   
30001  516322   667341.0    21500.0      21500.0          20775.0   36 months   
30002  516336   667356.0     5000.0       5000.0           5000.0   36 months   
30003  516327   667340.0     8000.0       8000.0           7950.0   36 months   
30004  515616   666492.0     4500.0       4500.0           4500.0   36 months   
...       ...        ...        ...          ...              ...         ...   
30995  501645   644812.0     7500.0       7500.0           7325.0   36 months   
30996  501617   644765.0    12000.0      12000.0          11225.0   36 months   
30997  499078   640383.0     8500.0       8500.0           8450.0   36 months   
30998  501621   644774.0    16000.0      16000.0          15375.0   36 months   
30999  501599   644734.0     5600.0       5600.0           5550.0   36 months   

      int_rate  installment

           id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \
37000  392890   430151.0     2775.0       2775.0      2775.000000   36 months   
37001  391892   428366.0    17500.0      17500.0     17175.000000   36 months   
37002  392826   430035.0    14000.0      14000.0     13350.000000   36 months   
37003  392814   430018.0    10000.0      10000.0      9705.425693   36 months   
37004  392787   429959.0     1500.0       1500.0      1500.000000   36 months   
...       ...        ...        ...          ...              ...         ...   
37995  369118   379239.0    13000.0      13000.0      7565.289495   36 months   
37996  369114   375913.0    14000.0      14000.0      7826.290000   36 months   
37997  368435   382966.0     7500.0       7500.0      3551.789934   36 months   
37998  369078   384383.0     6000.0       6000.0      5725.000000   36 months   
37999  369062   384337.0    15000.0      15000.0      5727.382442   36 months   

      int_rate  installment