File specifically for stuff that I am taking off website but do not want to delete. 

## How the Ocean Helps Us
**2a. Beach Attendance**
[Source](https://catalog.data.gov/dataset/swimming-beach-attendance)

- This data set covers the beach attendance in the state of New York
- Extraction Method: Exporting as CSV
- Link to Cleaned Data: [Here](data/01-modified-data/cleandata2a.csv)


![ ](images/data2.png){width=50%}

**2b. Employment in Ocean Sectors**
[Source](https://opdgig.dos.ny.gov/datasets/e25794dc9cab4fe8bd40d76a18c80f66/explore?showTable=true)

- This data illustrates 2013 ocean economy employment (number of jobs) for Northeast coastal counties, which comprise of 33 counties from Maine to New York.
- Extraction Method: Exporting as CSV

![ ](images/data10.png){width=50%}


**3a. Plastic Pollution**
 [Source](https://ourworldindata.org/plastic-pollution)

 - This dataset includes global data on plastic waste generation, pollution and trade. This resource also includes data visualiation already created to help readerse understand the dataset. 
 - Extraction Method: Exporting as CSV

 ![ ](images/data3.png){width=50%}


 **3b. Mismanaged Plastic Waste**
 [Source](https://www.kaggle.com/datasets/kkhandekar/mismanaged-plastic-waste-around-the-world/data)

 - This dataset provides information on mismanaged waste on many countries. Mismanaged waste is material at high risk of entering the ocean.
 - Extraction Method: Exporting as CSV

 ![ ](images/data7.png){width=50%}

 **4b. Marine Landings**
[Source](https://data.world/agriculture/aquaculture-data)

- Dataset with the total number(in millions) of how much marine landings are bringing in profit wise for 40 different countries
- Extraction Method: Exporting as XLS

![ ](images/data4b.png){width=40%}


## Why Ocean Sustainability is Necessary
**1a. Plastic Leakage by Region**

**Raw Data:** [Here](data/00-raw-data/1a.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data1a.rmd)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Viewing Column Names to see duplicates
    - Noticed duplicate names for location, water source and time 
    - Looked into if the duplicates had the same values and if not what was the difference between them
4. Getting rid of rows that had numbers for freshwater since my project is only addressing ocean sustainability. 
5. Removing Unnecessary columns that won’t help me address my question
6. Exporting as a CSV dataset to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata1a.csv)

**Overview:**

- Number of Columns: 4
- Number of Rows: 630
- Column Names: Location, Water Source, Time, Values (Tonnes of Plastic)
- Time Frame: 2019 - 2060

## How the Ocean Helps Us

**2a. Beach Attendance**

**Raw Data:** [Here](data/00-raw-data/2a.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data2a.R)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Adding a month and year column to make analysis easier in the future
4. Exporting as a CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata2a.csv)

**Overview:**

- Number of Columns: 5
- Number of Rows: 5052
- Column Names: Date, Beach, Attendance, Month, Year
- Time Frame: 2017 - 2022

 **2b. Employment in Ocean Sectors**

**Raw Data:** [Here](data/00-raw-data/2b.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data2b.R)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Made column headers shorter for ALL columns
4. Added a total employees column that calculates the total ocean sector employees for each county
5. Exporting as a CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata2b.csv)

**Overview:**

- Number of Columns: 8
- Number of Rows: 34
- Column Names: Employment Construction, Employment Living Resources, Employment Minerals, Employment Ship/Boat Building, Employment Tourism Recreation, Employment Transportation
- Time Frame: 2013 

**2c. Fishing Vessels**

**Raw Data:** [Here](data/00-raw-data/2c.xls)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data2c.r)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Chaning Header Names and making NA values for missing data
4. Exporting as CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata2c.csv)

**Overview:**

- Number of Columns: 22
- Number of Rows: 45
- Column Names: Country, 2000 - 2021
- Time Frame: 2000 - 2021


## Negative Impacts on Ocean Sustainability / Human Impact
**3a. Plastic Pollution**

**Raw Data:** [Here](data/00-raw-data/3a.csv)

**Cleaning Steps:** [Python Code](codes/02-data-cleaning/cleaning_data3a.py)

1. Importing Data into a Python Environment 
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Counting number of NaN values in columns 
4. Only five in the country code column. 
5. Determined it is okay to keep these nan values as the country is also listed
6. Renaming columns - ‘Entity’ to ‘Country’ and 'Mismanaged plastic waste to ocean per capita (kg per year)' to ‘Mismanaged Plastic’
7. Exporting data as a CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata3a.csv)

**Overview:**

- Number of Columns: 4
- Number of Rows: 165
- Column Names: Country, Code, Year, Mismanaged Plastic
- Time Frame: 2019

 **3b. Mismanaged Plastic Waste**

**Raw Data:** [Here](data/00-raw-data/3b.csv)

**Cleaning Steps:** [Python Code](codes/02-data-cleaning/cleaning_data3b.py)

1. Importing Data into a Python Environment
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Counting number of NaN values in columns
4. Renaming columns to make headers shorter 
5. Exporting data as a CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata3b.csv)

**Overview:**

- Number of Columns: 5
- Number of Rows: 194
- Column Names: Country, 2010_Mismanaged_Waste, 2019_Mismanaged_Waste, Mismanaged_PlasticWaste_PerCapita_2010 (kg per year),Mismanaged_PlasticWaste_PerCapita_2019 (kg per year) 
- Time Frame: 2010 and 2019


## Benefits of Investing in Sustainability

**4a. Aquaculture Production**

**Raw Data:** [Here](data/00-raw-data/4a.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data4a.r)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Getting Rid of Indicator, Subject and Frequency Columns because all rows have the same value 
4. Editing Column names to be more clear and concise 
5. Finding out time frame
6. Exporting as CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata4a.csv)

**Overview:**

- Number of Columns: 5
- Number of Rows: 2862
- Column Names: Location, Measure (Tonnes), Year, Value, Flag.codes
- Time Frame: 1995 - 2021

**4b. Marine Landings**

**Raw Data:** [Here](data/00-raw-data/4b.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data4b.Rmd)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Formatting Headers to correct years 
4. Adding NA to unknown values 
5. Exporting as CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata4b.csv)

**Overview:**

- Number of Columns: 22
- Number of Rows: 38
- Column Names: Country, 2000 - 2021
- Time Frame: 2000 - 2021


## What is Currently Being Done

**5b. Marine Pollution Act Numbers**

**Raw Data:** [Here](data/00-raw-data/5b.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data5b.Rmd)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Formatting Headers to correct years 
4. Adding NA to unknown values 
5. Exporting as CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata5b.csv)

**Overview:**

- Number of Columns: 24
- Number of Rows: 129
- Column Names: Country, 2000 - 2023
- Time Frame: 2000 - 2023

**5c.Money Going into Ocean Energy Research**

**Raw Data:** [Here](data/00-raw-data/5c.csv)

**Cleaning Steps:** [R Code](codes/02-data-cleaning/cleaning_data5c.Rmd)

1. Reading in Dataset as CSV to R Studio
2. Viewing Data to see errors / inconsistencies / modifications needed to be made
3. Formatting Headers to correct years 
4. Adding NA to unknown values 
5. Exporting as CSV to clean data folder

**Clean Data:** [Here](data/01-modified-data/cleandata5c.csv)

**Overview:**

- Number of Columns: 23
- Number of Rows: 35
- Column Names: Country, 2000 - 2022
- Time Frame: 2000 - 2022



