## Preparing our environment

by Anaïs Pepey

In [None]:
import pandas as pd
import seaborn as sns

## Importing data

👇 Load the `insecticides.csv` dataset into this notebook as a pandas dataframe, and display its first 5 rows.

🌏 This dataset includes data on antimalaria therapies efficacy and is kindly provided by the WHO through their data download [page](https://apps.who.int/malaria/maps/threats/#/download). 

In [None]:
df_01 = pd.read_csv("./data/workshop/insecticides.csv")
df_01.head()

👇 Now import the `drugs.csv` dataset and save it as `df_02`.

In [None]:
# type, uncomment and run your code here

<details><summary markdown='span'>View solution
</summary>

```python
df_02 = pd.read_csv("./data/workshop/drugs.csv")
df_02.head()
```

</details>

This dataset includes data on the efficacy of insecticides against different species of <i>Anopheles </i>mosquitoes and is kindly provided by the WHO through their data download [page](https://apps.who.int/malaria/maps/threats/#/download). 

## Summarizing the dataset


In [None]:
df_01.describe()

In [None]:
df_01.info()

❓ How many unique values are there of `PLASMODIUM_SPECIES`?

In [None]:
df_02['PLASMODIUM_SPECIES'].nunique()

❓ How many observations for each species are there in the dataset?

In [None]:
# write, uncomment and run your code here

<details><summary markdown='span'>Hint
</summary>
It might include something called `value_counts()`

</details>

<details><summary markdown='span'>View solution
</summary>

```python
df_02['PLASMODIUM_SPECIES'].value_counts()
```

</details>

## Visualising the dataset

👇 Seaborn allows us to plot pretty graphs in very few lines: 

In [None]:
ax = sns.barplot(x = "INSECTICIDE_CLASS", y = "MORTALITY_ADJUSTED", hue = "TEST_TYPE", data=df_01)
sns.move_legend(ax, "upper left", bbox_to_anchor = (1, 1))

👾 Your turn!

👇 Correct the code below so display a violin plot:

In [None]:
sns.violinplot("YEAR_START", "MORTALITY_ADJUSTED", data = df_01)

<details><summary markdown='span'>View solution
</summary>

```python
sns.violinplot(x="YEAR_START",y="MORTALITY_ADJUSTED",data=df_01)
```

</details>

👇 Complete the code below to make the `hue` of the data points proprotional to `PLASMODIUM_SPECIES`.

In [None]:
sns.stripplot(x = 'PLASMODIUM_SPECIES', y = 'TREATMENT_FAILURE_PP', data = df_02, alpha = 0.2, jitter = 0.4)

<details><summary markdown='span'>View solution
</summary>

```python
sns.stripplot(x = 'PLASMODIUM_SPECIES', y = 'TREATMENT_FAILURE_PP', hue = 'PLASMODIUM_SPECIES', data = df_02, alpha = 0.2, jitter = 0.4)
```

</details>

🚀 Feel free to play around and create your own plots!

📝 [Seaborn Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf)