# Data Analysis and Visualization in Python
## Data workflows and automation
Questions
* Can I automate operations in Python?
* What are functions and why should I use them?

Objectives
* Describe why `for` loops are used in Python.
* Employ `for` loops to automate data analysis.
* Write unique filenames in Python.
* Build reusable code in Python.
* Write functions using conditional statements (`if`, `then`, `else`).

## Loading our Data

In [None]:
import pandas as pd

# Load the data into a DataFrame
surveys_df = pd.read_csv('../data/surveys.csv')
species_df = pd.read_csv("../data/species.csv")

## Automating data processing using For Loops

In [None]:
import os

In [None]:
###_years = "yearly_files"
os.###(folder_years)

In [None]:
os.###('.')

In [None]:
# Select only data for 2002
surveys2002 = surveys_df[surveys_df['year'] ### ###]

# Write the new DataFrame to a csv file
surveys2002.###(os.path.join(folder_years, "surveys_2002.csv"), ###)
os.listdir(###)

In [None]:
surveys_df['year']###

In [None]:
for year in surveys_df['year'].unique():
    filename = os.path.join(folder_years, "surveys_" + ### + ".csv")
    print(filename)

In [None]:
for year in surveys_df['year'].unique():
    # Select data for the year
    surveys_### = surveys_df[surveys_df['year'] == ###]
    
    # Write the new DataFrame to a csv file
    filename = os.path.join(folder_years, "surveys_" + str(year) + ".csv")
    surveys_year.###(filename, index=False)

os.listdir(folder_years)

### Exercises - Creating multiple CSV files
Instead of splitting out the data by years, a colleague wants to analyse each species separately. How would you write a unique csv file for each species?

In [None]:
folder_### = "species_files"
os.mkdir(folder_species)

In [None]:
merged_left = pd.merge(left=surveys_df, right=species_df, how='###', on="species###")

for species in merged_left['###'].unique():
    # Select data for the current species
    merged_left_species = merged_left[merged_left### == ###].dropna()

    # Write the new DataFrame to a csv file
    filename = os.path.join(folder_species, "surveys_" + ### + ".csv")
    merged_left_species.###(filename, index=###)

os.listdir(folder_###)

## Building reusable and modular code with functions

In [None]:
### one_year_csv_writer(all_data, folder_to_save, prefix, this_year):
    """
    Writes a csv file for data from a given year.

    all_data --- DataFrame with multi-year data
    folder_to_save --- folder to save the data files
    prefix --- prefix for the CSV file name
    this_year --- year for which data is extracted
    """

    # Select data for the year
    data_for_year = ###_data[###_data['year'] == ###_year]

    # Write the new DataFrame to a csv file
    filename = os.path.join(folder_to_save, prefix + ###(###_year) + ".csv")
    data_for_year.to_csv(filename, index=False)

In [None]:
###(one_year_csv_writer)

In [None]:
one_year_csv_writer(surveys_df, folder_###, "###_surveys_", 2002)
os.listdir(folder_years)

In [None]:
def yearly_data_csv_writer(all_data, folder_to_save, prefix, ###, ###):
    """
    Writes separate csv files for each year of data.

    all_data --- DataFrame with multi-year data
    folder_to_save --- folder to save the data files
    prefix --- prefix for the CSV file name
    start_year --- the first year of data we want
    end_year --- the last year of data we want
    """

    # "end_year" is the last year of data we want to pull, so we loop to end_year+1
    for year in ###(start_year, end_year ###):
        one_year_csv_writer(all_data, folder_to_save, prefix, year)

In [None]:
yearly_data_csv_writer(surveys_df, folder_years, "###_surveys_", ###, ###)
os.listdir(folder_years)

## Testing arguments with conditions
* Test if the `folder_to_save` does not exist. Hint:

```python
if 'dir_name_here' in os.listdir('.'):
   print('Processed directory exists')
else:
   os.mkdir('dir_name_here')
   print('Processed directory created')

```
* Use `None` as default `start_year` and `end_year`.
* Make the functions return a list of the files they have written.

In [None]:
def one_year_csv_writer(all_data, folder_to_save, prefix, this_year):
    """
    Writes a csv file for data from a given year. Returns the filename.

    all_data --- DataFrame with multi-year data
    folder_to_save --- folder to save the data files
    prefix --- prefix for the CSV file name
    this_year --- year for which data is extracted
    """

    # Select data for the year
    data_for_year = all_data[all_data['year'] == this_year]

    # Write the new DataFrame to a csv file
    filename = os.path.join(folder_to_save, prefix + str(this_year) + ".csv")
    data_for_year.to_csv(filename, index=False)

    ### ###

In [None]:
def yearly_data_csv_writer(all_data, folder_to_save, prefix,
                           start_year = ###, end_year = ###):
    """
    Modified from yearly_data_csv_writer to test default argument values!

    all_data --- DataFrame with multi-year data
    folder_to_save --- folder to save the data files
    prefix --- prefix for the CSV file name
    ### --- the first year of data we want --- default: ### - check all_data
    ### --- the last year of data we want --- default: ### - check all_data
    """

    ### folder_to_save ### os.listdir('.'):
        print('Processed directory exists')
    ###:
        os.mkdir(folder_to_save)
        print('Processed directory created')

    if ### start_year:
        start_year = ###(all_data['year'])
    if not end_year:
        end_year = ###(all_data['year'])

    filenames = ###

    # "end_year" is the last year of data we want to pull, so we loop to end_year+1
    for year in range(start_year, end_year + 1):
        filenames.###(one_year_csv_writer(all_data, folder_to_save, prefix, year))

    ###

In [None]:
yearly_data_csv_writer(surveys_df, 'final', 'results_', 1995, 1998)

In [None]:
yearly_data_csv_writer(surveys_df, 'final', 'results_')