Welcome to your DataCamp project audition! This notebook must be filled out and vetted before a contract can be signed and you can start creating your project.

The first step is forking the repository in which this notebook lives. After that, there are two parts to be completed in this notebook:

- **Project information**:  The title of the project, a project description, etc.

- **Project introduction**: The three first text and code cells that will form the introduction of your project.

When complete, please email the link to your forked repo to projects@datacamp.com with the email subject line _DataCamp project audition_. If you have any questions, please reach out to projects@datacamp.com.

# Project information

**Project title**: The title of the project. Maximum 41 characters.

**Name:** Your full name.

**Email address associated with your DataCamp account:** You can find this email [here](https://www.datacamp.com/profile/account_settings) if you have a DataCamp account.

**Project description**: This will be read by the students on the DataCamp platform **before** deciding to start the project. The description should be three paragraphs, written in Markdown.

- Paragraph 1 should be an exciting introduction to analysis/model/etc. students will complete.
- Paragraph 2 should list the background knowledge you assume the student doing this project will have, the more specific the better. Please list things like modules, tools, functions, methods, statistical concepts, etc.
- Paragraph 3 should describe and link to (if possible) the dataset used in the project.

# Project introduction

***Note: nothing needs to be filled out in this cell. It is simply setting up the template cells below.***

The final output of a DataCamp project looks like a blog post: pairs of text and code cells that tell a story about data. The text is written from the perspective of the data analyst and *not* from the perspective of an instructor on DataCamp. So, for this blog post intro, all you need to do is pretend like you're writing a blog post -- forget the part about instructors and students.

Below you'll see the structure of a DataCamp project: a series of "tasks" where each task consists of a title, a **single** text cell, and a **single** code cell. There are 8-12 tasks in a project and each task can have up to 10 lines of code. What you need to do:
1. Read through the template structure.
2. As best you can, divide your project as it is currently visualized in your mind into tasks.
3. Fill out the template structure for the first three tasks of your project.

As you are completing each task, you may wish to consult the project notebook format in our [documentation](https://instructor-support.datacamp.com/projects/datacamp-projects-jupyter-notebook). Only the `@context` and `@solution` cells are relevant to this audition.

## 1. Title of the first task  (<= 55 chars) (sentence case)

An exciting intro to the analysis. Provide context on the problem you're going to solve, the dataset(s) you're going to use, the relevant industry, etc. You may wish to briefly introduce the techniques you're going to use. Tell a story to get students excited! It should at most have 1200 characters.

The most common error instructors make in **context cells** is referring to the student or the project. We want project notebooks to appear as a blog post or a data analysis. Bad: *"In this project, you will..."* Good: *"In this notebook, we will..."*

The first task in projects often involve loading data. Please store any data files you use in the `datasets/` folder in this repository.

Images are welcome additions to every Markdown cell, but especially this first one. Make sure the images you use have a [permissive license](https://support.google.com/websearch/answer/29508?hl=en) and display them using [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#images). Store your images in the `img/` folder in this repository.


OK, we are ready to go, let's start by importing the csv data in a pandas DataFrame. pandas is a great python library for data processing.

In [8]:
import pandas as pd
marvel_data = pd.read_csv("datasets/hero_network.csv")
marvel_data.head()

Unnamed: 0,id1,id2
0,"LITTLE, ABNER",PRINCESS ZANDA
1,"LITTLE, ABNER",BLACK PANTHER/T'CHAL
2,BLACK PANTHER/T'CHAL,PRINCESS ZANDA
3,"LITTLE, ABNER",PRINCESS ZANDA
4,"LITTLE, ABNER",BLACK PANTHER/T'CHAL


## 2. Title of the second task (<= 55 chars)  (sentence case)

Context / background / story / etc. This cell should at most have 800 characters.

The most common error instructors make in **context cells** is referring to the student or the project. We want project notebooks to appear as a blog post or a data analysis. Bad: *"In this task, you will..."* Good: *"Next, we will..."*

In [12]:
marvel_data["id1"].value_counts()

CAPTAIN AMERICA         8149
SPIDER-MAN/PETER PAR    6652
IRON MAN/TONY STARK     5850
THOR/DR. DONALD BLAK    5712
THING/BENJAMIN J. GR    5369
WOLVERINE/LOGAN         5230
SCARLET WITCH/WANDA     5184
VISION                  5067
HUMAN TORCH/JOHNNY S    4970
MR. FANTASTIC/REED R    4788
INVISIBLE WOMAN/SUE     4723
BEAST/HENRY &HANK& P    4628
HAWK                    4506
CYCLOPS/SCOTT SUMMER    4492
WASP/JANET VAN DYNE     4452
STORM/ORORO MUNROE S    4170
COLOSSUS II/PETER RA    3997
PROFESSOR X/CHARLES     3973
ANT-MAN/DR. HENRY J.    3727
MARVEL GIRL/JEAN GRE    3667
HULK/DR. ROBERT BRUC    3648
ICEMAN/ROBERT BOBBY     3271
WONDER MAN/SIMON WIL    3252
ANGEL/WARREN KENNETH    3229
NIGHTCRAWLER/KURT WA    3084
ROGUE /                 2918
DR. STRANGE/STEPHEN     2915
SHE-HULK/JENNIFER WA    2797
PATRIOT/JEFF MACE       2791
JAMESON, J. JONAH       2791
                        ... 
SYNARIO/ANGELA BRADF       1
FAHE                       1
INFINITY THRALL            1
DIETZ, SUSAN  

## 3. Title of the third task (<= 55 chars)  (sentence case)

Context / background / story / etc. This cell should at most have 800 characters.

The most common error instructors make in **context cells** is referring to the student or the project. We want project notebooks to appear as a blog post or a data analysis. Bad: *"In this task, you will..."* Good: *"Next, we will..."*

In [3]:
# Code and comments for the third task
# It should consist of up to 10 lines of code (not including comments)
# and take at most 10 seconds to execute on an average laptop.

*Stop here! Only the three first tasks. :)*