# Coding Temple's Data Analytics Program
---
## Python for DA: Weekend Project

For this weekend project, you will be connecting to the [Disney API](https://disneyapi.dev/) to create an ETL pipeline. Your project should contain:

- etl_pipeline.py
    - Loads in data from the API object for all characters
    - Stores required fields from the API to a DataFrame
        - name
        - all movies/shows the character appeared in
        - any allies
        - any enemies
        - any park attractions
    - Cleans the data
    - Performs any transformations/feature engineering you wish to complete
    - Stores the data in an ElephantSQL server
    - Stores the data in a .csv file

- notebook.ipynb
    - Contains all cells you used to test your code before loading it into the pipeline
    - Loads in the data from your .csv file
    - Conduct EDA through data
    - Conduct an analysis on your dataset!

In [60]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import requests, json



In [106]:
def charac_dict(ind_data) -> dict:
    character_takein = {
        'name':ind_data['name'],
        'id':ind_data['_id'],
        'films':ind_data['films'],
        'tv_shows':ind_data['tvShows'],
        'video_games':ind_data['videoGames'], # Leaving this one in, it does only have 16.5% information. But I do like to see Kingdom Hearts be relevant for once.
        # 'short_films':ind_data['shortFilms'], 
        # 'park_attractions': ind_data['parkAttractions'],
        # 'allies':ind_data['allies'],
        # 'enemies':ind_data['enemies'],
        'source_url':ind_data['sourceUrl'],
        'api_url':ind_data['url']
    }
    return character_takein


def wrangle(file_path):
    disney_page1 = (requests.get(file_path)).json()
    
    charac_info = {}
    while disney_page1['info']['nextPage'] != None:
        for i in range(len(disney_page1['data'])):
            character = charac_dict(disney_page1['data'][i])
            charac_info[(character['name'])] = character
        disney_page1 =(requests.get(disney_page1['info']['nextPage'])).json()
    
    df = pd.DataFrame.from_dict(charac_info)
    tpose_df = df.transpose().reset_index() # This is to flip the columns and rows.
    tpose_df.drop('index',axis=1,inplace=True) #This is to remove the original keys of the dictionary
    
    for name in tpose_df.columns.tolist():
        for e in range(len(tpose_df[name])):
            if isinstance(tpose_df[name][e], list) and len(tpose_df[name][e]) > 0:
                tpose_df[name][e] = ", ".join(tpose_df[name][e])
    # This is to iterate through each value and to remove the lists that are not empty
    return tpose_df


df = wrangle('https://api.disneyapi.dev/character')


0       The Hunchback of Notre Dame, The Hunchback of ...
1          The Fox and the Hound, The Fox and the Hound 2
2                                                 Cheetah
3               Mary Poppins (film), Mary Poppins Returns
4                                                      []
                              ...                        
6976                                                   []
6977                Sofia the First: Once Upon a Princess
6978    A Wrinkle in Time (film), A Wrinkle in Time (2...
6979                                                   []
6980                                                   []
Name: films, Length: 6981, dtype: object


In [78]:
df.info()
def disnull(clean_df):
    """
    A code block pulled from Tuesday's homework and retrofitted into a function.
    This function's purpose is to iterate through a Pandas DataFrame and return
    a dictionary with keys telling you the amount of null values and non null values.
    
    Taken apart and retrofitted to check empty lists instead
    """
    null_check = {}
    for names in clean_df.columns.tolist():
        null_checkin = [{'nonull':0},{'isnull':0}]
        if names not in ['name','id','source_url','api_url']:
            for i in range(len(clean_df[names])):
                if len(clean_df[names][i]) == 0:
                    null_checkin[1]['isnull'] += 1
                else:
                    null_checkin[0]['nonull'] += 1
            null_check[f'{names}']=null_checkin
    return null_check

print(disnull(df))
print(1152/6981)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6981 entries, 0 to 6980
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         6981 non-null   object
 1   id           6981 non-null   object
 2   films        6981 non-null   object
 3   tv_shows     6981 non-null   object
 4   video_games  6981 non-null   object
 5   source_url   6981 non-null   object
 6   api_url      6981 non-null   object
dtypes: object(7)
memory usage: 381.9+ KB
{'films': [{'nonull': 3115}, {'isnull': 3866}], 'tv_shows': [{'nonull': 4293}, {'isnull': 2688}], 'video_games': [{'nonull': 1152}, {'isnull': 5829}]}
0.16501933820369574


Going to be ripping out a bunch of stuff that doesn't have any information.

In [93]:
df['films']

for i in range(len(df['films'])):
    if isinstance((df['films'][i]), list) and len(df['films'][i]) > 0:
        df['films'][i] = ", ".join(df['films'][i])
df['films']

0       The Hunchback of Notre Dame, The Hunchback of ...
1          The Fox and the Hound, The Fox and the Hound 2
2                                                 Cheetah
3               Mary Poppins (film), Mary Poppins Returns
4                                                      []
                              ...                        
6976                                                   []
6977                Sofia the First: Once Upon a Princess
6978    A Wrinkle in Time (film), A Wrinkle in Time (2...
6979                                                   []
6980                                                   []
Name: films, Length: 6981, dtype: object

In [31]:
disney_page1 = (requests.get('https://api.disneyapi.dev/character')).json()
disney_page1['data'][5] # this is to look at where I need to refer to while making the character dictionary


{'_id': 12,
 'films': [],
 'shortFilms': [],
 'tvShows': ['Pickle and Peanut'],
 'videoGames': [],
 'parkAttractions': [],
 'allies': [],
 'enemies': [],
 'sourceUrl': 'https://disney.fandom.com/wiki/90%27s_Adventure_Bear_(character)',
 'name': "90's Adventure Bear",
 'imageUrl': 'https://static.wikia.nocookie.net/disney/images/3/3f/90%27s_Adventure_Bear_profile.png',
 'createdAt': '2021-04-12T01:26:00.335Z',
 'updatedAt': '2021-12-20T20:39:18.032Z',
 'url': 'https://api.disneyapi.dev/characters/12',
 '__v': 0}