**For our project, we wanted to examine the amount of plastic pollution that exists in our oceans, and be able to predict how large of an issue it will be in the future.**

**Where did the data come from?**  
Our datasets were obtained from [this](https://www.kaggle.com/sohamgade/plastic-datasets/version/1?select=per-capita-plastic-waste-vs-gdp-per-capita.csv) source, and the data itself was obtained from Our World in Data's article on Plastic Pollution found [here](https://ourworldindata.org/plastic-pollution).

In [93]:
import numpy as np
import pandas as pd
from sklearn import linear_model

In [121]:
#Importing and creating DataFrames to use for linear regression
path1 = 'FinalProject/global-plastics-production.csv'
path2 = 'FinalProject/plastic-waste-per-capita.csv'
path3 = 'FinalProject/mismanaged-waste-global-total.csv'
path4 = 'FinalProject/per-capita-plastic-waste-vs-gdp-per-capita.csv'

#CSV representing global plastic production
df1 = pd.read_csv(path1)

#CSV representing plastic waste per capita
df2 = pd.read_csv(path2)

#CSV representing global mismanaged waste
df3 = pd.read_csv(path3)

#CSV representing population for each country
df4 = pd.read_csv(path4)
df4 = df4[df4['Year'] == 2010]

#Create main DataFrame with important features
newFrame = [df2["Entity"], df2["Code"], df2["Per capita plastic waste (kg/person/day)"], df3["Mismanaged waste (% global total)"]]
newHeaders = ["Entity", "Code", "Per capita plastic waste (kg/person/day)", "Mismanaged waste (% global total)"]
df5 = pd.concat(newFrame, axis=1, keys=newHeaders)
df5.insert(1, 'Global plastics production (million tonnes)', 313000000)
df5

#Merge the population DataFrame to be included in main DataFrame
newDf = pd.merge(df5, df4[['Entity', 'Total population (Gapminder, HYDE & UN)']], on='Entity', how='left')
newDf = newDf[pd.notna(newDf['Total population (Gapminder, HYDE & UN)'])]

In [122]:
#Multiple Linear Regression

X = newDf[['Global plastics production (million tonnes)', 'Per capita plastic waste (kg/person/day)', 'Total population (Gapminder, HYDE & UN)']]
y = newDf['Mismanaged waste (% global total)']

#Fit the above columns to a multiple linear regression model to be able to predict plastic wasted
regression = linear_model.LinearRegression()
regression.fit(X, y)

#Example prediction (Total plastic produced, waste per person, population)
#predictedWaste = regression.predict([[313000000, .291, 36471837]])

[0.56929254]
