# Project - CO2 Per Capita

![Data Science Workflow](img/ds-workflow.png)

## Goal of Project
- Explore how Data Visualization can help present findings with a message
- We will explore the CO2 per capita dataset
- It will be your task to what kind of message you want the receiver to get
- NOTE: We still have limited skills, hence, we must limit our ambitions in our analysis

## Step 1: Acquire
- Explore problem
- Identify data
- Import data

### Step 1.a: Import libraries
- Execute the cell below (SHIFT + ENTER)

### Step 1.b: Read the data
- Use ```pd.read_csv()``` to read the file `files/WorldBank-ATM.CO2E.PC_DS2.csv`
- NOTE: Remember to assign the result to a variable (e.g., ```data```)
- NOTE: Use ```index_col=0``` as argument to set the index column.
- Apply ```.head()``` on the data to see all is as expected

### Step 1.c: Size of data
- The columns represent countries and rows years
- Apply ```.shape``` on the DataFrame to see if data is as expected

## Step 2: Prepare
- Explore data
- Visualize ideas
- Cleaning data

### Step 2.a: Check the data types
- This step tells you if some numeric column is not represented numeric.
- Get the data types by ```.dtypes```
- We expect all data to numeric
- Try out ```.info()``` to get an overview.

### Step 2.b: Check for null (missing) values
- Data often is missing entries - there can be many reasons for this
- We need to deal with that (will do later in course)
- Use ```.isnull().any()```
- This is expected - but we need to be aware of it

### Step 2.c: Visualize number of missing data points
- To get an idea of the maginitude of the problem you can visualize the number of missing rows for each country.
- ```.isnull()``` identifies missing values
- ```.isnull().sum()``` counts the number of missing values per country
- ```.isnull().sum().plot.hist()``` plots how many countries have missing values in ranges

### Step 2.d: Clean data
- A simple way to clean data is to remove columns with missing data
- Use ```.dropna(axis='columns')``` to remove columns with missing data
- Check how may columns are left
    - HINT: apply ```len(...)``` on the DataFrame columns

## Step 3: Analyze
- Feature selection
- Model selection
- Analyze data

### Step 3.a: Percentage change 20 years
- Calculate the change in CO2 per capita from 1998 to 2018
    - HINT: Formula is (value in 2018 - value in 1998) / value in 1998
    - This can be calculated on all rows simultaneously
        - ```(data_clean.loc[2018] - data_clean.loc[1998])/data_clean.loc[1998]``` Assuming the data is in ```data_clean```

### Step 3.b: Describe the data 
- A great way to understand data is to apply ```.describe()```
- How does this help you to understand data?

### Step 3.c: Visualization of data
- This helps you to understand data better
- We start with a histogram ```.plot.hist(bins=30)```
- Try with a Pie Chart on values below 0
    - HINT: Use ```(data_plot < 0).value_counts()``` (assuming data is in ```data_plot```)
    - Chart: ```.plot.pie(colors=['r', 'g'], labels=['>= 0', '< 0'], title='Title', ylabel='label', autopct='%1.1f%%')```
- Play around with other visualizations

## Step 4: Report
- Present findings
- Visualize results
- Credibility counts

### Step 4.a: Present a chart
- The goal here is to present your message
- Visualize one chart
- Add a headline (title) to give the audience a message

### Step 4.b (Optional): Present another chart
- Can you make a supporting chart?
- Or dig deeper into the data?
- Does this give a true picture of the situation?
- Ideas:
    - Look at the last 10 years
    - Are many countris close 0

## Step 5: Actions
- Use insights
- Measure impact
- Main goal

### Step 5.a: Actions
- Propose actions

### Step 5.b Measure impact
- Propose how to measure impact of actions