# LIFE EXPECTANCY PROJECT 
## THIS CODE GET DATA  FROM API AND MODIFY ITS STRUCTURE BEFORE PUTS IN MONGODB

# IMPORTING REQUIRED LIBRARIES


In [3]:

import requests
import pandas as pd
from pymongo import MongoClient


# FETCHING DATA FROM API 

In [5]:

my_url = "https://ourworldindata.org/grapher/life-expectancy.csv"
response = requests.get(my_url)
my_data = pd.read_csv(my_url) 
print("I got the data from the website!")
print("First few rows of raw data:")
print(my_data.head())

I got the data from the website!
First few rows of raw data:
        Entity Code  Year  \
0  Afghanistan  AFG  1950   
1  Afghanistan  AFG  1951   
2  Afghanistan  AFG  1952   
3  Afghanistan  AFG  1953   
4  Afghanistan  AFG  1954   

   Period life expectancy at birth - Sex: total - Age: 0  
0                                            28.1563      
1                                            28.5836      
2                                            29.0138      
3                                            29.4521      
4                                            29.6975      


# CLEANING THE DATA

In [7]:

my_data = my_data.rename(columns={"Entity": "Country", "Period life expectancy at birth - Sex: total - Age: 0": "Life Expectancy"})

# Only keep the columns I want
my_data = my_data[["Country", "Year", "Life Expectancy"]]

# Remove any blank rows
my_data = my_data.dropna()

# Make sure the data types are correct
my_data["Year"] = my_data["Year"].astype(int)  # Make Year a number
my_data["Life Expectancy"] = my_data["Life Expectancy"].astype(float)  # Make Life Expectancy a decimal

# Remove any duplicate rows
my_data = my_data.drop_duplicates()

# Sort the data by Country and Year
my_data = my_data.sort_values(by=["Country", "Year"])

# Show the cleaned data
print("")  # Empty line
print("Cleaned data looks like this:")
print(my_data.head())  # Show first 5 rows again



Cleaned data looks like this:
       Country  Year  Life Expectancy
0  Afghanistan  1950          28.1563
1  Afghanistan  1951          28.5836
2  Afghanistan  1952          29.0138
3  Afghanistan  1953          29.4521
4  Afghanistan  1954          29.6975


# SAVING PROCESSED DATA IN TO MONGO DB

In [9]:

# ESTABLISH A CONNECTION
mongo = MongoClient('mongodb://localhost:27017/')
my_database = mongo['life_expectancy_db']  # My database name
my_collection = my_database['life_expectancy']  # My collection name

# REMOVING OLD DATA
my_collection.drop()

# CONVERTING DATA INTO DICTONARY 
data_list = my_data.to_dict('records')

# ADDED DATA TO MONGO DB
my_collection.insert_many(data_list)
print("I added the data to MongoDB")

# VERIFYING DATA
count = my_collection.count_documents({})
print("Total documents in MongoDB:", count)

# SHOW DATA TO VERIFY 
first = my_collection.find_one()
print("First document in MongoDB:", first)

# MAKING A INDEX
my_collection.create_index([("Country", 1), ("Year", 1)])
print("I made an index for Country and Year!")

# CLOSING CONNECTION
mongo.close()
print("Closed the MongoDB connection")

I added the data to MongoDB
Total documents in MongoDB: 21565
First document in MongoDB: {'_id': ObjectId('68026ba54ffc2e0c78267da7'), 'Country': 'Afghanistan', 'Year': 1950, 'Life Expectancy': 28.1563}
I made an index for Country and Year!
Closed the MongoDB connection
