In [1]:
%%html
<style>
table {align:left;display:block}  # to align html tables to left
</style>

In [2]:
# import libraries
from IPython.display import Image  # for displaying images in markdown cells
import pandas as pd  # Dataframe manipulation
import matplotlib.pyplot as plt  # Plot charts
import matplotlib.style as style  # Inherit styles
from matplotlib.pyplot import figure  # to adjust plot figure size
import numpy as np  # Arrays manipulation 
from numpy import mean, std  # mean, standard deviation

# Enables Jupyter to display graphs
%matplotlib inline

# Dataquest - Intermediate Statistics: Averages And Variability <br/> <br/> Project Title: Finding The Best Markets To Advertise In

## 1) Finding The Best Two Markets to Advertise In

#### Key skills applied in project:
- How to summarize distributions using the mean, the median, and the mode.
- How to measure the variability of a distribution using the range, the mean absolute deviation, the variance, and the standard deviation.
- How to locate any value in a distribution using z-scores.

#### Background
Provided by: [Dataquest.io](https://www.dataquest.io/)

Let's assume that we're working for an an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

## 2) Understanding the Data 
Provided by: [Dataquest.io](https://www.dataquest.io/)

To reach our goal, we could organize surveys for a couple of different markets to find out which would the best choices for advertising. This is very costly, however, and it's a good call to explore cheaper options first.

We can try to search existing data that might be relevant for our purpose. One good candidate is the data from [freeCodeCamp's 2017 New Coder Survey](https://medium.freecodecamp.org/we-asked-20-000-people-who-they-are-and-how-theyre-learning-to-code-fff5d668969). [freeCodeCamp](https://www.freecodecamp.org/) is a free e-learning platform that offers courses on web development. Because they run a [popular Medium publication](https://medium.freecodecamp.org/) (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis.



#### Metadata:
**2017-fCC-New-Coders-Survey-Data.csv**
- The survey data is publicly available in [this GitHub repository](https://github.com/freeCodeCamp/2017-new-coder-survey).
- Metadata details in [Json file here](https://raw.githubusercontent.com/freeCodeCamp/2017-new-coder-survey/master/clean-data/datapackage.json)

| Column | Title | Question Asked | Datatype context |
| --- | --- | --- | --- |
| Age | Age of Individual | How old are you? | integer |
| AttendedBootcamp | Attended a Bootcamp | Have you attended a full time coding bootcamp? | boolean |
| BootcampFinish | Finished a Bootcamp or Not | Have you finished [your coding bootcamp]? | boolean |
| BootcampLoanYesNo | Bootcamp Loan Yes or No | Did you take out a loan to pay for the bootcamp? | boolean |
| BootcampName | The film in question |
| BootcampRecommend | The film in question |
| ChildrenNumber | The film in question |
| CityPopulation | The film in question |
| CodeEventConferences | The film in question |
| CodeEventDjangoGirls | The film in question |
| CodeEventFCC | The film in question |
| CodeEventGameJam | The film in question |
| CodeEventGirlDev | The film in question |
| CodeEventHackathons | The film in question |
| CodeEventMeetup | The film in question |
| CodeEventNodeSchool | The film in question |
| CodeEventNone | The film in question |
| CodeEventOther | The film in question |
| CodeEventRailsBridge | The film in question |
| CodeEventRailsGirls | The film in question |
| CodeEventStartUpWknd | The film in question |
| CodeEventWkdBootcamps | The film in question |
| CodeEventWomenCode | The film in question |
| CodeEventWorkshops | The film in question |
| CommuteTime | The film in question |
| CountryCitizen | The film in question |
| CountryLive | The film in question |
| EmploymentField | The film in question |
| EmploymentFieldOther | The film in question |
| EmploymentStatus | The film in question |
| EmploymentStatusOther | The film in question |
| ExpectedEarning | The film in question |
| FinanciallySupporting | The film in question |
| FirstDevJob | The film in question |
| Gender | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |
| FILM | The film in question |


In [61]:
# Alternative way to present metadata quickly
# Python program to read
# json file
import json
import pprint  # to print nice nested dictionaries

# Opening JSON file
with open('datapackage.json') as f:
    data = json.load(f)
    
print(json.dumps(data, indent=1))

{
 "name": "2017-new-coder-survey",
 "title": "2017 New Coder Survey",
 "description": "freeCodeCamp's new coder data including survey data on demographic information and coding related activities",
 "homepage": "https://github.com/freeCodeCamp/2017-new-coder-survey",
 "version": "0.0.1",
 "license": "Open Database License",
 "author": {
  "name": "Eric Leung",
  "web": "https://erictleung.com"
 },
 "resources": [
  {
   "name": "2017-fcc-new-coders-survey-data.csv",
   "title": "2017 fCC New Coders Survey Data",
   "description": "A single combined file, comprised of the two-part new coders' survey conducted by freeCodeCamp.",
   "schema": {
    "fields": [
     {
      "name": "Age",
      "title": "Age of Individual",
      "description": "The question asked was \"How old are you?\"",
      "type": "integer"
     },
     {
      "name": "AttendedBootcamp",
      "title": "Attended a Bootcamp",
      "description": "The question asked was \"Have you attended a full time coding bootca

| Column | Title | Question Asked | Datatype context |
| --- | --- | --- | --- |
| Age | Age of Individual | How old are you? | integer |

In [64]:
# Source: https://data-dive.com/jupyterlab-markdown-cells-include-variables
# Instead of setting the cell to Markdown, create Markdown from withnin a code cell!
# We can just use python variable replacement syntax to make the text dynamic
#
# Also to wrap lines: See instructions in above link to configure
# settings in Jupyter Notebook
#
# More documentation to create html table using Ipython.display
# https://www.cs.put.poznan.pl/wjaskowski/pub/teaching/kck/lectures/notebooks/ipython-notebook.html

from IPython.display import Markdown as md

md("The data consists of x observations. Bla, Bla, ....")

The data consists of x observations. Bla, Bla, ....

In [126]:
# More documentation to create html table using Ipython.display
# https://www.cs.put.poznan.pl/wjaskowski/pub/teaching/kck/lectures/notebooks/ipython-notebook.html
from IPython.display import HTML
s = """<table>
<tr>
<th>Column</th>
<th>Title</th>
<th>Question Asked</th>
<th>Datatype context</th>
</tr>
<!-- Comment: use Jinja to loop in html -->
{% for row in dict %}
<tr>
<td>{{ row['name'] }}</td>
<td>{{ row['title'] }}</td>
<td>{{ row['description'] }}</td>
<td>{{ row['type'] }}</td>
</tr>
{% endfor %}
</table>"""
h = HTML(s); h

Column,Title,Question Asked,Datatype context
{{ row['name'] }},{{ row['title'] }},{{ row['description'] }},{{ row['type'] }}


In [127]:
#for i, dict in zip(range(0, len(data['resources'][0]['schema']['fields'])), data['resources'][0]['schema']['fields']):
#                   print(dict['name'], dict['title'], dict['description'], dict['type'])

dict = data['resources'][0]['schema']['fields']
dict

[{'name': 'Age',
  'title': 'Age of Individual',
  'description': 'The question asked was "How old are you?"',
  'type': 'integer'},
 {'name': 'AttendedBootcamp',
  'title': 'Attended a Bootcamp',
  'description': 'The question asked was "Have you attended a full time coding bootcamp?"',
  'type': 'boolean'},
 {'name': 'BootcampFinish',
  'title': 'Finished a Bootcamp or Not',
  'description': 'The question asked was "Have you finished [your coding bootcamp]?"',
  'type': 'boolean'},
 {'name': 'BootcampLoanYesNo',
  'title': 'Bootcamp Loan Yes or No',
  'description': 'The question asked was "Did you take out a loan to pay for the bootcamp?"',
  'type': 'boolean'},
 {'name': 'BootcampName',
  'title': 'Bootcamp Name',
  'description': 'The question asked was "Which [coding bootcamp]?"',
  'type': 'string'},
 {'name': 'BootcampRecommend',
  'title': 'Bootcamp Recommendation',
  'description': 'The question asked was "Based on your experience, would you recommend this bootcamp to your fr

In [12]:
# read and load df
df = pd.read_csv('2017-fCC-New-Coders-Survey-Data.txt', sep=',')

# review df
df

Unnamed: 0,Age,AttendedBootcamp,BootcampFinish,BootcampLoanYesNo,BootcampName,BootcampRecommend,ChildrenNumber,CityPopulation,CodeEventConferences,CodeEventDjangoGirls,...,YouTubeFCC,YouTubeFunFunFunction,YouTubeGoogleDev,YouTubeLearnCode,YouTubeLevelUpTuts,YouTubeMIT,YouTubeMozillaHacks,YouTubeOther,YouTubeSimplilearn,YouTubeTheNewBoston
0,27.0,0.0,,,,,,more than 1 million,,,...,,,,,,,,,,
1,34.0,0.0,,,,,,"less than 100,000",,,...,1.0,,,,,,,,,
2,21.0,0.0,,,,,,more than 1 million,,,...,,,,1.0,1.0,,,,,
3,26.0,0.0,,,,,,"between 100,000 and 1 million",,,...,1.0,1.0,,,1.0,,,,,
4,20.0,0.0,,,,,,"between 100,000 and 1 million",,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18170,41.0,0.0,,,,,1.0,more than 1 million,,,...,,,,,,,,never see,,
18171,31.0,0.0,,,,,1.0,more than 1 million,,,...,1.0,,,,,,,,,
18172,39.0,0.0,,,,,3.0,more than 1 million,,,...,1.0,,,,,,,,,1.0
18173,54.0,0.0,,,,,3.0,"between 100,000 and 1 million",,,...,1.0,,,1.0,,,1.0,,,
