# Exploratory Analysis
### Notebook created by [Bright Cape](https://brightcape.nl/)
*Author: Maurits Akkerman*

This notebook is designed to provide you with an example of an exploratory analysis. It provides some of the steps you have seen in the notebook on [Data Cleaning & Visualization](https://colab.research.google.com/github/MauritsAkkerman/AppliedDataScience/blob/main/Data%20Cleaning%20%26%20Visualization.ipynb), but then on your own unique dataset! This exploratory analysis can be combined with the knowledge gained from the Applied Data Science Bootcamp to provide meaningfull insights into your data.

# Contents <a name="Contents"></a>

* [Installing Key Package](#Installing-Key-Package)

* [Upload your File](#Upload-your-File)

* [Running the Exploratory Analysis](#Running-the-Exploratory-Analysis)


# Installing Key Package<a name="Installing-Key-Package"></a>
Before you can upload your data, we need to install a package on Google Colab. Run the following lines of code:

In [None]:
# Instals the Pandas Profiling module necessary for the exploratory analysis
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

After you have succesfully installed this package, we need to restart the console. Proceed to execute the following steps:
1. On the top of your screen click the Runtime tab
2. Click on Restart Runtime (CTRL + M .)
3. A message box will appear to ask if your sure, press "YES"
Now you're set to proceed with the exploratory analysis

# Upload your File<a name="Upload-your-File"></a>
[[ go back to the top ]](#Contents)

Run the following code. After you have done so, you should be able to upload your own file, proceed to do so.

In [None]:
from google.colab import files
uploaded = files.upload()

To check whether our upload was succesfull, we print the top lines of the dataframe.

In [None]:
import io
import pandas as pd

for delim in [',', ';',' ',]:
  data = pd.read_csv(io.StringIO(uploaded[list(uploaded.keys())[0]].decode('utf-8')), delimiter=delim, index_col=0)
  if data.shape[1] != 0:
    break

data.head()

# Running the Exploratory Analysis<a name="Running-the-Exploratory-Analysis"></a>
[[ go back to the top ]](#Contents)

Now let's run the exploratory analysis, the code below uses a package called Pandas Profiling to extract key observations from your data, without requiring manual coding (or thus experience with coding). After running the code, you should see a progress bar and eventually you can interactively click through the results of the analysis.

In [None]:
import pandas_profiling as ppf
profile = ppf.ProfileReport(data, title= "Pandas Profile",
                                html = { 'style' : { 'full_width' : True }},
                                missing_diagrams={
                                    'bar': True,
                                    'matrix': True,
                                    'heatmap': True,
                                    'dendrogram': False},
                                correlations = {'pearson' : {
                                                    'calculate' : True,
                                                    'warn_high_correlations' : True,
                                                    'threshold' : 0.9
                                                          },
                                                'spearman': {
                                                    'calculate': False,
                                                    'warn_high_correlations': False
                                                },
                                                'phi_k' : {
                                                    'calculate' : False,
                                                    'warn_high_correlations' : False
                                                          },
                                                'cramers': {
                                                    'calculate': False,
                                                    'warn_high_correlations': False,
                                                            },
                                                'kendall': {
                                                    'calculate': False,
                                                    'warn_high_correlations': False
                                                            }})
profile.to_notebook_iframe()

Write down which conclusions you can draw or interesting findings that you discovered based on the data. During the Applied Data Science Bootcamp we can then discuss these findings!