# Finding the Best Two Markets to Advertise In

<p style="text-align:center;">
  <img src="e-learning.jpg" width="700" height="100">
  <br>
  Source: <a href="https://unsplash.com/">Unsplash</a>
</p>


## Introduction

This data analysis project aims to determine the optimal advertising strategy for an e-learning company specializing in programming courses. The company's course offerings span various domains, including web and mobile development, data science, game development, and more. In this project, we aim to identify the two most promising markets to invest advertising funds in, with the ultimate goal of promoting the company's products effectively. By leveraging relevant data sources and analytical tools, we aim to provide actionable insights to inform the company's advertising decision-making process.

## Understanding the Data

In order to determine the most effective markets for advertising our programming courses, conducting surveys in different markets is one option. However, this approach can be expensive, so we should explore more economical options first.

One alternative is to search for relevant data that already exists. One promising source is [the 2017 New Coder Survey](https://medium.freecodecamp.org/we-asked-20-000-people-who-they-are-and-how-theyre-learning-to-code-fff5d668969). conducted by [freeCodeCamp](https://www.freecodecamp.org/), a free e-learning platform specializing in web development courses. This survey, which received responses from more than 20,000 people, was published on the popular [Medium publication of freeCodeCamp](https://medium.freecodecamp.org/), which has over 400,000 followers. The survey attracted not only those interested in web development, but also new coders with diverse interests, making it a valuable resource for our analysis.

The survey data is publicly available in [this GitHub repository](https://github.com/freeCodeCamp/2017-new-coder-survey).

To ensure that we are equipped with the necessary tools to analyze the data, we will begin by importing the required libraries.

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
pd.options.display.max_columns=150 # to avoid truncated output 

We will now quickly explore the `2017-fCC-New-Coders-Survey-Data.csv` file, which is stored in the `clean-data` folder of the repository mentioned earlier. Alternatively, we can use the direct link provided [here](https://raw.githubusercontent.com/freeCodeCamp/2017-new-coder-survey/master/clean-data/2017-fCC-New-Coders-Survey-Data.csv) to read in the file.

In [13]:
# Read the survey data into dataframe and view first five rows
fcc_survey = pd.read_csv('2017-fCC-New-Coders-Survey-Data.csv', low_memory=False)
fcc_survey.head()

Unnamed: 0,Age,AttendedBootcamp,BootcampFinish,BootcampLoanYesNo,BootcampName,BootcampRecommend,ChildrenNumber,CityPopulation,CodeEventConferences,CodeEventDjangoGirls,...,YouTubeFCC,YouTubeFunFunFunction,YouTubeGoogleDev,YouTubeLearnCode,YouTubeLevelUpTuts,YouTubeMIT,YouTubeMozillaHacks,YouTubeOther,YouTubeSimplilearn,YouTubeTheNewBoston
0,27.0,0.0,,,,,,more than 1 million,,,...,,,,,,,,,,
1,34.0,0.0,,,,,,"less than 100,000",,,...,1.0,,,,,,,,,
2,21.0,0.0,,,,,,more than 1 million,,,...,,,,1.0,1.0,,,,,
3,26.0,0.0,,,,,,"between 100,000 and 1 million",,,...,1.0,1.0,,,1.0,,,,,
4,20.0,0.0,,,,,,"between 100,000 and 1 million",,,...,,,,,,,,,,


In [14]:
print(f'The survey consists of {fcc_survey.shape[0]} rows and {fcc_survey.shape[1]} columns.')

The survey consists of 18175 rows and 136 columns.


We will attempt to understand the purpose of each column. While most column names are self-explanatory, there isn't a clear documentation available that explains the meaning of each column name. However, we can find more information in the [datapackage.json](https://github.com/freeCodeCamp/2017-new-coder-survey/blob/master/clean-data/datapackage.json) file in the `clean-data` folder of the previously mentioned [repository](https://github.com/freeCodeCamp/2017-new-coder-survey). The initial survey questions are available there, which can help us infer the description of each column.