# [Guided Project: Finding the best market to advertise in](https://app.dataquest.io/m/310/guided-project%3A-finding-the-best-markets-to-advertise-in)

-------

## Intro: Where should we spend money to make more money?

Advertisement can be very powerful weapon. The problem is how to wield it and getting the most out of what is invested to it. In this occasion we'll look into finding the best market that would yield the most profit out of the investment.

To do this we need data. [freeCodeCamp 2017 new coder survey](https://github.com/freeCodeCamp/2017-new-coder-survey/blob/master/clean-data/2017-fCC-New-Coders-Survey-Data.csv) is a suitable candidate for us to look into.

To solve this, we want the answers to:
1. Where are the potential new coders located?
2. What locations have the greatest number of new coders?
3. How big are they willing to spend on learning?

In [1]:
# Importing packages for data management 
import pandas as pd    # Importing pandas
import numpy as np     # Importing numpy
import datetime as dt  # Importing datetime
import re              # Importing regular expression
import warnings        # To suppress warning alert
warnings.filterwarnings('ignore')
#Change setting to avoid dataframe from truncating
pd.options.display.max_rows = 500
pd.options.display.width = 500
pd.options.display.max_colwidth = 500
pd.options.display.max_columns = 500

In [4]:
# Reading in the csv and displaying the dataframe 
ncs = pd.read_csv("2017-fCC-New-Coders-Survey-Data.csv")
ncs.head(5)

Unnamed: 0,Age,AttendedBootcamp,BootcampFinish,BootcampLoanYesNo,BootcampName,BootcampRecommend,ChildrenNumber,CityPopulation,CodeEventConferences,CodeEventDjangoGirls,CodeEventFCC,CodeEventGameJam,CodeEventGirlDev,CodeEventHackathons,CodeEventMeetup,CodeEventNodeSchool,CodeEventNone,CodeEventOther,CodeEventRailsBridge,CodeEventRailsGirls,CodeEventStartUpWknd,CodeEventWkdBootcamps,CodeEventWomenCode,CodeEventWorkshops,CommuteTime,CountryCitizen,CountryLive,EmploymentField,EmploymentFieldOther,EmploymentStatus,EmploymentStatusOther,ExpectedEarning,FinanciallySupporting,FirstDevJob,Gender,GenderOther,HasChildren,HasDebt,HasFinancialDependents,HasHighSpdInternet,HasHomeMortgage,HasServedInMilitary,HasStudentDebt,HomeMortgageOwe,HoursLearning,ID.x,ID.y,Income,IsEthnicMinority,IsReceiveDisabilitiesBenefits,IsSoftwareDev,IsUnderEmployed,JobApplyWhen,JobInterestBackEnd,JobInterestDataEngr,JobInterestDataSci,JobInterestDevOps,JobInterestFrontEnd,JobInterestFullStack,JobInterestGameDev,JobInterestInfoSec,JobInterestMobile,JobInterestOther,JobInterestProjMngr,JobInterestQAEngr,JobInterestUX,JobPref,JobRelocateYesNo,JobRoleInterest,JobWherePref,LanguageAtHome,MaritalStatus,MoneyForLearning,MonthsProgramming,NetworkID,Part1EndTime,Part1StartTime,Part2EndTime,Part2StartTime,PodcastChangeLog,PodcastCodeNewbie,PodcastCodePen,PodcastDevTea,PodcastDotNET,PodcastGiantRobots,PodcastJSAir,PodcastJSJabber,PodcastNone,PodcastOther,PodcastProgThrowdown,PodcastRubyRogues,PodcastSEDaily,PodcastSERadio,PodcastShopTalk,PodcastTalkPython,PodcastTheWebAhead,ResourceCodecademy,ResourceCodeWars,ResourceCoursera,ResourceCSS,ResourceEdX,ResourceEgghead,ResourceFCC,ResourceHackerRank,ResourceKA,ResourceLynda,ResourceMDN,ResourceOdinProj,ResourceOther,ResourcePluralSight,ResourceSkillcrush,ResourceSO,ResourceTreehouse,ResourceUdacity,ResourceUdemy,ResourceW3S,SchoolDegree,SchoolMajor,StudentDebtOwe,YouTubeCodeCourse,YouTubeCodingTrain,YouTubeCodingTut360,YouTubeComputerphile,YouTubeDerekBanas,YouTubeDevTips,YouTubeEngineeredTruth,YouTubeFCC,YouTubeFunFunFunction,YouTubeGoogleDev,YouTubeLearnCode,YouTubeLevelUpTuts,YouTubeMIT,YouTubeMozillaHacks,YouTubeOther,YouTubeSimplilearn,YouTubeTheNewBoston
0,27.0,0.0,,,,,,more than 1 million,,,,,,,,,,,,,,,,,15 to 29 minutes,Canada,Canada,software development and IT,,Employed for wages,,,,,female,,,1.0,0.0,1.0,0.0,0.0,0.0,,15.0,02d9465b21e8bd09374b0066fb2d5614,eb78c1c3ac6cd9052aec557065070fbf,,,0.0,0.0,0.0,,,,,,,,,,,,,,,start your own business,,,,English,married or domestic partnership,150.0,6.0,6f1fbc6b2b,2017-03-09 00:36:22,2017-03-09 00:32:59,2017-03-09 00:59:46,2017-03-09 00:36:26,,,,1.0,,,,,,,,,,,,,,1.0,,,,,,1.0,,,,1.0,,,,,,,,1.0,1.0,"some college credit, no degree",,,,,,,,,,,,,,,,,,,
1,34.0,0.0,,,,,,"less than 100,000",,,,,,,,,,,,,,,,,,United States of America,United States of America,,,Not working but looking for work,,35000.0,,,male,,,1.0,0.0,1.0,0.0,0.0,1.0,,10.0,5bfef9ecb211ec4f518cfc1d2a6f3e0c,21db37adb60cdcafadfa7dca1b13b6b1,,0.0,0.0,0.0,,Within 7 to 12 months,,,,,,1.0,,,,,,,,work for a nonprofit,1.0,Full-Stack Web Developer,in an office with other developers,English,"single, never married",80.0,6.0,f8f8be6910,2017-03-09 00:37:07,2017-03-09 00:33:26,2017-03-09 00:38:59,2017-03-09 00:37:10,,1.0,,,,,,,,,,,,,,,,1.0,,,1.0,,,1.0,,,,,,,,,1.0,,,1.0,1.0,"some college credit, no degree",,,,,,,,,,1.0,,,,,,,,,
2,21.0,0.0,,,,,,more than 1 million,,,,,,1.0,,1.0,,,,,,,,,15 to 29 minutes,United States of America,United States of America,software development and IT,,Employed for wages,,70000.0,,,male,,,0.0,0.0,1.0,,0.0,,,25.0,14f1863afa9c7de488050b82eb3edd96,21ba173828fbe9e27ccebaf4d5166a55,13000.0,1.0,0.0,0.0,0.0,Within 7 to 12 months,1.0,,,1.0,1.0,1.0,,,1.0,,,,,work for a medium-sized company,1.0,"Front-End Web Developer, Back-End Web Developer, DevOps / SysAdmin, Mobile Developer, Full-Stack Web Developer",no preference,Spanish,"single, never married",1000.0,5.0,2ed189768e,2017-03-09 00:37:58,2017-03-09 00:33:53,2017-03-09 00:40:14,2017-03-09 00:38:02,1.0,,1.0,,,,,,,Codenewbie,,,,,1.0,,,1.0,,,1.0,,,1.0,,,,1.0,,,,,,,1.0,1.0,,high school diploma or equivalent (GED),,,,,1.0,,1.0,1.0,,,,,1.0,1.0,,,,,
3,26.0,0.0,,,,,,"between 100,000 and 1 million",,,,,,,,,,,,,,,,,I work from home,Brazil,Brazil,software development and IT,,Employed for wages,,40000.0,0.0,,male,,0.0,1.0,1.0,1.0,1.0,0.0,0.0,40000.0,14.0,91756eb4dc280062a541c25a3d44cfb0,3be37b558f02daae93a6da10f83f0c77,24000.0,0.0,0.0,0.0,1.0,Within the next 6 months,1.0,,,,1.0,1.0,,,,,,,,work for a medium-sized company,,"Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer",from home,Portuguese,married or domestic partnership,0.0,5.0,dbdc0664d1,2017-03-09 00:40:13,2017-03-09 00:37:45,2017-03-09 00:42:26,2017-03-09 00:40:18,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,,,,1.0,,,,,1.0,,,,,"some college credit, no degree",,,,,,,,1.0,,1.0,1.0,,,1.0,,,,,
4,20.0,0.0,,,,,,"between 100,000 and 1 million",,,,,,,,,,,,,,,,,,Portugal,Portugal,,,Not working but looking for work,,140000.0,,,female,,,0.0,0.0,1.0,,0.0,,,10.0,aa3f061a1949a90b27bef7411ecd193f,d7c56bbf2c7b62096be9db010e86d96d,,0.0,0.0,0.0,,Within 7 to 12 months,1.0,,,,1.0,1.0,,1.0,1.0,,,,,work for a multinational corporation,1.0,"Full-Stack Web Developer, Information Security, Mobile Developer, Front-End Web Developer, Back-End Web Developer",in an office with other developers,Portuguese,"single, never married",0.0,24.0,11b0f2d8a9,2017-03-09 00:42:45,2017-03-09 00:39:44,2017-03-09 00:45:42,2017-03-09 00:42:50,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,bachelor's degree,Information Technology,,,,,,,,,,,,,,,,,,


In [6]:
# go_describe: Function that returns a modified dataframe.describe() result 
def go_describe(df, data = 'num'):
    '''Return transposed pandas describe with share of 
    nan-values. Parameters: df = pandas.DataFrame'''
    df_des = df.describe(include='all').transpose() 
    df_des['share_nan'] = (1-(df_des['count']/df.shape[0]))*100
    df_des['count_nan'] = (df.shape[0]-df_des['count']).astype(int)
    df_des['count'] = df_des['count'].astype(int)
    df_des['mode'] = df.mode(axis='index',numeric_only=False).iloc[0,:].transpose()
    df_des['median'] = df.median(axis='index',numeric_only=True).transpose()
    option = data
    if option == 'num':
        order = ['25%', '50%', '75%', 'count', 'count_nan', 'share_nan','mean','median','mode', 'std', 'min', 'max']
    elif option == 'cat':
        order = ['count', 'count_nan', 'share_nan','mode']
    df_des = df_des[order].sort_values(by=['count_nan'],ascending=False).style.background_gradient(subset='count_nan', low=0, high=1, cmap='Blues')
    return df_des.data

go_describe(ncs)

Unnamed: 0,25%,50%,75%,count,count_nan,share_nan,mean,median,mode,std,min,max
GenderOther,,,,55,18120,99.6974,,,genderfluid,,,
CodeEventRailsGirls,1.0,1.0,1.0,132,18043,99.2737,1.0,1.0,1,0.0,1.0,1.0
CodeEventRailsBridge,1.0,1.0,1.0,133,18042,99.2682,1.0,1.0,1,0.0,1.0,1.0
CodeEventDjangoGirls,1.0,1.0,1.0,165,18010,99.0922,1.0,1.0,1,0.0,1.0,1.0
PodcastGiantRobots,1.0,1.0,1.0,187,17988,98.9711,1.0,1.0,1,0.0,1.0,1.0
YouTubeSimplilearn,1.0,1.0,1.0,201,17974,98.8941,1.0,1.0,1,0.0,1.0,1.0
JobInterestOther,,,,266,17909,98.5365,,,Undecided,,,
CodeEventGameJam,1.0,1.0,1.0,290,17885,98.4044,1.0,1.0,1,0.0,1.0,1.0
CodeEventGirlDev,1.0,1.0,1.0,297,17878,98.3659,1.0,1.0,1,0.0,1.0,1.0
PodcastTheWebAhead,1.0,1.0,1.0,311,17864,98.2889,1.0,1.0,1,0.0,1.0,1.0


## Looking at the job role interest

In [15]:
pd.DataFrame(ncs['JobRoleInterest'].value_counts(normalize=True)*100)

Unnamed: 0,JobRoleInterest
Full-Stack Web Developer,11.770595
Front-End Web Developer,6.435927
Data Scientist,2.173913
Back-End Web Developer,2.030892
Mobile Developer,1.673341
...,...
"Data Engineer, Back-End Web Developer, Game Developer, Data Scientist, Full-Stack Web Developer",0.014302
"Data Engineer, Data Scientist, User Experience Designer",0.014302
"Data Scientist, User Experience Designer, Front-End Web Developer, Full-Stack Web Developer",0.014302
"Back-End Web Developer, Data Engineer, Mobile Developer, Game Developer, User Experience Designer, Full-Stack Web Developer, Front-End Web Developer",0.014302


## Page three
Figure out whether the sample we have is representative for our population of interest.
The JobRoleInterest column describes for every participant the role(s) they'd be interested in working.
Generate a frequency distribution table for this column. Take percentages instead of absolute frequencies.
Analyze the table.
- Are people interested in only one subject or they can be interested in more than one subject?
- If most people are interested in more than one subject, is this sample still representative?
- The focus of our courses is on web and mobile development. How many people are interested in at least one of these two subjects?

Generate at least one graph while you're working on these steps to help the reader understand easier what you're doing.
Use Markdown cells to explain the readers what you're doing.

## Page four
To make sure you're working with a representative sample, drop all the rows where participants didn't answer what role they are interested in. Where a participant didn't respond, we can't know for sure what their interests are, so it's better if we leave out this category of participants.
- Generate a frequency table for the CountryLive variable.
- Generate both absolute and relative frequencies.
- Analyze the results.
- Based on the results, what are the two markets you'd choose for advertisement?

Can we stop the analysis here, or we need to go more in depth?

## Page five 
Create a new column that describes the amount of money a student has spent per month (at the moment they completed the survey).

You'll need to divide the MoneyForLearning column to the MonthsProgramming column.
Some students answered that they had been learning to code for 0 months (it might be that they had just started when they completed the survey). To avoid dividing by 0, replace all the values of 0 with 1.
Find out how many null values there are in the new column (the column describing the amount of money students spend per month).

- Keep only the rows that don't have a null value for the new column.
- Remove also any rows that have null values in the CountryLive column.
- Group the remaining data by the CountryLive column and find out how much money a student spends on average each month in the US, India, the United Kingdom and Canada.
- You can use the DataFrame.groupby() method.
- As a summary metric, we recommend choosing the mean to take into account all values in the distributions. You can also compute the median or the mode to see how they compare with the mean.

Analyze the results. Is there anything in the results that looks off?

## Page six
Generate four box plots on the same figure to visualize for each country (the US, India, the United Kingdom, Canada) the distribution of the variable that describes how much money each participant had spent per month.

Can you spot extreme outliers for India, Canada or the United Kingdom?
If not, what extreme outliers can you spot?
Eliminate the extreme outliers.

Recompute the mean values, just like we did in the previous screen: group the data by the CountryLive column, and then find out how much money a student spends on average each month in the US, India, the United Kingdom and Canada.

If the mean values still look off, look more for extreme outliers. For instance, you can find a couple of persons in India who spend $5000 per month. Isolate these respondents and examine their answers to other questions in the survey to figure out whether these big expenses with learning are justified — you can try to find out whether they attended any bootcamp, which might justify the large amount of money spent.
If you find more extreme outliers, remove them, and recompute the mean values.
If you get stuck, you can always sneak a look at the solution notebook.
Is it clear enough at this point what are the two best countries to choose for advertisement?

## Page seven
Try to choose the second market to advertise in.

Remember that we sell subscriptions at a price of $59 per month.
Make sure you also consider the number of potential customers in each country.
Based on all of the results you've found so far, brainstorm a couple of possible decisions.

- Does it make sense to advertise in more then two countries?
- Does it make sense to split the advertising budget unequally (e.g.: spend 70% to advertise in the US and 30% to advertise in India)?
- Does it make sense to advertise only in the US?
- If we had a marketing team in our company, would it be better to just send them our results and let them use their domain knowledge to take the best decision?

## Page eight
In this project, we analyzed survey data from new coders to find the best two markets to advertise in. The only solid conclusion we reached is that the US would be a good market to advertise in.

For the second best market, it wasn't clear-cut what to choose between India and Canada. We decided to send the results to the marketing team so they can use their domain knowledge to take the best decision.

You might have reached different conclusions, which is totally fine, as long as you constructed a sound reasoning for those conclusions. Try to wrap up your work by writing a conclusion section that has no more than two paragraphs.

You can also continue working on this project. Next steps include:

Finding other criteria for choosing the best market.
Analyzing other data sets:
- [freeCodeCamp's 2016 New Coders Survey](https://github.com/freeCodeCamp/2016-new-coder-survey).
- [Stack Overflow 2018 Developer Survey](https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey).


[SOLUTION](https://github.com/dataquestio/solutions/blob/master/Mission310Solutions.ipynb)