## Task

In this compulsory task you will clean the country column and parse the date column in the **store_income_data_task.csv** file.

In [23]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Load up store_income_data.csv
df = pd.read_csv('store_income_data_task.csv')

1. Take a look at all the unique values in the "country" column. Then, convert the column to lowercase and remove any trailing white spaces.

In [24]:
# Display unique values in the "country" column
print("Unique values in 'country' column before cleaning:")
print(df['country'].unique())

# Clean up the "country" column
df['country'] = df['country'].str.lower().str.strip()


Unique values in 'country' column before cleaning:
['United States/' 'Britain' ' United States' 'Britain/' ' United Kingdom'
 'U.K.' 'SA ' 'U.K/' 'America' 'United Kingdom' nan 'united states'
 ' S.A.' 'England ' 'UK' 'S.A./' 'ENGLAND' 'BRITAIN' 'U.K' 'U.K '
 'America/' 'SA.' 'S.A. ' 'u.k' 'uk' ' ' 'UK.' 'England/' 'england'
 ' Britain' 'united states of america' 'UK/' 'SA/' 'SA' 'England.'
 'UNITED KINGDOM' 'America.' 'S.A..' 's.a.' ' U.K'
 ' United States of America' 'Britain ' 'England' ' SA'
 'United States of America.' 'United States of America/' 'United States.'
 's. africasouth africa' ' England' 'United Kingdom '
 'United States of America ' ' UK' 'united kingdom' 'AMERICA' 'America '
 'UNITED STATES OF AMERICA' ' S. AfricaSouth Africa' 'america'
 'S. AFRICASOUTH AFRICA' 'Britain.' '/' 'United Kingdom.' 'United States'
 ' America' 'UNITED STATES' 'sa' 'United States of America' 'UK '
 'United States ' 'S. AfricaSouth Africa/' 'S.A.' 'United Kingdom/'
 'S. AfricaSouth Africa ' '

2. Note that there should only be three separate countries. Eliminate all variations, so that 'South Africa', 'United Kingdom' and 'United States' are the only three countries.

In [25]:
# Ensure there are only three distinct countries
df['country'] = df['country'].replace({'usa': 'united states', 'us': 'united states', 'uk': 'united kingdom'})

# Display unique values in the "country" column after cleaning
print("\nUnique values in 'country' column after cleaning:")
print(df['country'].unique())


Unique values in 'country' column after cleaning:
['united states/' 'britain' 'united states' 'britain/' 'united kingdom'
 'u.k.' 'sa' 'u.k/' 'america' nan 's.a.' 'england' 's.a./' 'u.k'
 'america/' 'sa.' '' 'uk.' 'england/' 'united states of america' 'uk/'
 'sa/' 'england.' 'america.' 's.a..' 'united states of america.'
 'united states of america/' 'united states.' 's. africasouth africa'
 'britain.' '/' 'united kingdom.' 's. africasouth africa/'
 'united kingdom/' 's. africasouth africa.' '.']


3. Create a new column called `days_ago` in the DataFrame that is a copy of the 'date_measured' column but instead it is a number that shows how many days ago it was measured from the current date. Note that the current date can be obtained using `datetime.date.today()`.

In [26]:
# Convert current_date to a pandas Timestamp object
current_date = pd.Timestamp(datetime.today().date())

# Create a new column called "days_ago"
df['date_measured'] = pd.to_datetime(df['date_measured'], format='%d-%m-%Y')
df['days_ago'] = (current_date - df['date_measured']).dt.days

# Display the DataFrame with the new "days_ago" column
print("\nDataFrame with the 'days_ago' column:")
print(df.head())


DataFrame with the 'days_ago' column:
   id                   store_name         store_email  department  \
0   1   Cullen/Frost Bankers, Inc.                 NaN    Clothing   
1   2          Nordson Corporation                 NaN       Tools   
2   3        Stag Industrial, Inc.                 NaN      Beauty   
3   4          FIRST REPUBLIC BANK  ecanadine3@fc2.com  Automotive   
4   5  Mercantile Bank Corporation                 NaN        Baby   

         income date_measured         country  days_ago  
0  $54438554.24    2006-02-04  united states/      6658  
1  $41744177.01    2006-01-04         britain      6689  
2  $36152340.34    2003-09-12   united states      7534  
3   $8928350.04    2006-05-08        britain/      6565  
4  $33552742.32    1973-01-21  united kingdom     18725  
