# Part I - (Dataset Exploration Title)
## by F Njakai



## Table of Contents

* [Introduction](#intro)
* [Preliminary Wrangling](#prel-wrangling)
* [Univariate Exploration](#univar)
* [Bivariate Exploration](#bivar)
* [Multivariate Exploration](#multivar)
* [Summary of Findings](#summary)
* [Conclusions](#outro)

<div id="intro"></div>

## Introduction

[Prosper](https://www.prosper.com/) is the first peer-to-peer lending marketplace in the United States. Borrowers apply online for a fixed-rate, fixed-term loan between USD 2,000.00 and USD 40,000.00. Individuals and institutions, Sequoia Capital for example, invest in the loans. Prosper handles all loan servicing on behalf of the borrowers and investors.

The data set to be explored has 113,937 observations (loans, if you like) and 81 variables; detailed information on the variables can be found [here](https://docs.google.com/spreadsheets/d/1gDyi_L4UvIrLTEC6Wri5nbaMmkGmLQBk-Yx3z0XDEtI/edit#gid=0). 

>**Rubric Tip**: Your code should not generate any errors, and should use functions, loops where possible to reduce repetitive code. Prefer to use functions to reuse code statements.

> **Rubric Tip**: Document your approach and findings in markdown cells. Use comments and docstrings in code cells to document the code functionality.

>**Rubric Tip**: Markup cells should have headers and text that organize your thoughts, findings, and what you plan on investigating next.

<div id="prel-wrangling"></div>



## Preliminary Wrangling


In [6]:
#import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from random import randint

%matplotlib inline

#### Default settings for plots

Automate, as much as possible, the process of creating visualisations

Why?
* it is efficient
* vizzes will be consistent

How?
* achieve this by creating templates


In [5]:
#blue colour in `seaborn` plots
default_colour = sns.color_palette()[0]

# #default figsize for a `Figure` obj
# default_figsize = plt.figure(figsize=(10, 6.18), dpi=216)


#default

<matplotlib.figure.Figure at 0x7f4112fdd8d0>

In [9]:
'''
simple function to create a `Figure` object
that contains an x-lab, y-lab and title:

"Father Figure", if you like.

3 params, all type `str`:
x_lab, y_lab and title

Please make sure all args passed to the 
function are type `str`

return: None
'''
def create_fig(x_lab: str, y_lab: str, title: str):
    """create_fig function"""
    try:
        #fig size
        plt.figure(figsize=(10, 6.18), dpi=216)
        #x-axis name
        plt.xlabel(x_lab)
        #y-axis name 
        plt.ylabel(y_lab)
        #title
        plt.title(title)
    except:
        print(f'Failed to create template')
        raise


> Load in your dataset and describe its properties through the questions below. Try and motivate your exploration goals through this section.


In [10]:
#see if a df exists

'''
a simple function to see if a df exists

takes in 1 param: name of the df

Please do not pass the arg as a string

return: None
'''

def confirm_exists(df):
    """ function confirm_exists """
    if not df.empty:
        print(f'This dataframe exists')
        return
    print(f'This dataframe does not exist')
    



In [None]:
#load the data set
df = pd.read_csv('', sep=',')
confirm_exists(df)

In [None]:
df.info

In [None]:
df.shape()

In [None]:
df.duplicated().value_counts()

In [None]:
df.sample(randint(5, 15))

### Structure

#### Overall

* x observations
* y variables
    * a of type `int`
    * b of type `float`
    * c of type `bool`
    * d of type `str`
* more...

#### Missing and null values

* `df` has x missing or null values

#### Duplicated observations

* `df` has x duplicated observations

#### Multiple values for a variable

* observations in `df` have x values per variables 



### What is/are the main feature(s) of interest in your dataset?

> Your answer here!

### What features in the dataset do you think will help support your investigation into your feature(s) of interest?

> Your answer here!

<div id="univar"></div>

## Univariate Exploration

> In this section, investigate distributions of individual variables. If
you see unusual points or outliers, take a deeper look to clean things up
and prepare yourself to look at relationships between variables.


> **Rubric Tip**: The project (Parts I alone) should have at least 15 visualizations distributed over univariate, bivariate, and multivariate plots to explore many relationships in the data set.  Use reasoning to justify the flow of the exploration.



>**Rubric Tip**: Use the "Question-Visualization-Observations" framework  throughout the exploration. This framework involves **asking a question from the data, creating a visualization to find answers, and then recording observations after each visualisation.** 




>**Rubric Tip**: Visualizations should depict the data appropriately so that the plots are easily interpretable. You should choose an appropriate plot type, data encodings, and formatting as needed. The formatting may include setting/adding the title, labels, legend, and comments. Also, do not overplot or incorrectly plot ordinal data.

### Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

> Your answer here!

### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

> Your answer here!

<div id="bivar"></div>

## Bivariate Exploration

> In this section, investigate relationships between pairs of variables in your
data. Make sure the variables that you cover here have been introduced in some
fashion in the previous section (univariate exploration).

### Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

> Your answer here!

### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

> Your answer here!

<div id="multivar"></div>

## Multivariate Exploration

> Create plots of three or more variables to investigate your data even
further. Make sure that your investigations are justified, and follow from
your work in the previous sections.

### Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

> Your answer here!

### Were there any interesting or surprising interactions between features?

> Your answer here!

<div id="summary"></div>

## Summary of Findings

<div id="outro"></div>

## Conclusions
>You can write a summary of the main findings and reflect on the steps taken during the data exploration.



> Remove all Tips mentioned above, before you convert this notebook to PDF/HTML


> At the end of your report, make sure that you export the notebook as an
html file from the `File > Download as... > HTML or PDF` menu. Make sure you keep
track of where the exported file goes, so you can put it in the same folder
as this notebook for project submission. Also, make sure you remove all of
the quote-formatted guide notes like this one before you finish your report!

