# Acquire 
in Time Series

## Exercises

Create a new local git repository and remote repository on github named time-series-exercises. Save this work for this module in your time-series-exercises repo.

The end result of this exercise should be a file named acquire.py.

### 1. 

In [4]:
import pandas as pd
import requests
import math
import os

pd.set_option("display.max_rows", 6)
pd.set_option("display.max_columns", 50)

In [3]:
# Using the code from the lesson as a guide and the REST API from https://swapi.dev/ as we did in the lesson,
# create a dataframe named people that has all of the data for people.

In [9]:
def get_swapi_api(endpoint):
    # If endpoint csv exists, use it
    if os.path.isfile(f"{endpoint}.csv"):
        print('CSV file found.')
        return pd.read_csv(f"{endpoint}.csv")
    else:
        # Find how many pages the api has
        response = requests.get(f"https://swapi.dev/api/{endpoint}/")
        data = response.json()
        # Assume data is a dictionary containing the results and count
        num_results = len(data["results"])
        num_pages = math.ceil((data["count"] / num_results))
        # Create an empty dataframe
        endpoint_df = pd.DataFrame()
        # Create a loop to get all pages
        for i in range(1, (num_pages)+1):
            print(f"{i} of {num_pages}")
            # Get the data from the API
            response = requests.get(f"https://swapi.dev/api/{endpoint}/?page=" + str(i))
            # Convert the data to json
            data = response.json()
            # Convert the json to a dataframe
            df = pd.DataFrame(data["results"])
            # Concat the dataframes
            endpoint_df = pd.concat([endpoint_df, df])
            # Reset the index of the df
            endpoint_df = endpoint_df.reset_index(drop=True)
        return endpoint_df

In [10]:
people = get_swapi_api("people")
people

1 of 9
2 of 9
3 of 9
4 of 9
5 of 9
6 of 9
7 of 9
8 of 9
9 of 9


Unnamed: 0,name,height,mass,hair_color,skin_color,eye_color,birth_year,gender,homeworld,films,species,vehicles,starships,created,edited,url
0,Luke Skywalker,172,77,blond,fair,blue,19BBY,male,https://swapi.dev/api/planets/1/,"[https://swapi.dev/api/films/1/, https://swapi...",[],"[https://swapi.dev/api/vehicles/14/, https://s...","[https://swapi.dev/api/starships/12/, https://...",2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,https://swapi.dev/api/people/1/
1,C-3PO,167,75,,gold,yellow,112BBY,,https://swapi.dev/api/planets/1/,"[https://swapi.dev/api/films/1/, https://swapi...",[https://swapi.dev/api/species/2/],[],[],2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,https://swapi.dev/api/people/2/
2,R2-D2,96,32,,"white, blue",red,33BBY,,https://swapi.dev/api/planets/8/,"[https://swapi.dev/api/films/1/, https://swapi...",[https://swapi.dev/api/species/2/],[],[],2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,https://swapi.dev/api/people/3/
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,Raymus Antilles,188,79,brown,light,brown,unknown,male,https://swapi.dev/api/planets/2/,"[https://swapi.dev/api/films/1/, https://swapi...",[],[],[],2014-12-20T19:49:35.583000Z,2014-12-20T21:17:50.493000Z,https://swapi.dev/api/people/81/
80,Sly Moore,178,48,none,pale,white,unknown,female,https://swapi.dev/api/planets/60/,"[https://swapi.dev/api/films/5/, https://swapi...",[],[],[],2014-12-20T20:18:37.619000Z,2014-12-20T21:17:50.496000Z,https://swapi.dev/api/people/82/
81,Tion Medon,206,80,none,grey,black,unknown,male,https://swapi.dev/api/planets/12/,[https://swapi.dev/api/films/6/],[https://swapi.dev/api/species/37/],[],[],2014-12-20T20:35:04.260000Z,2014-12-20T21:17:50.498000Z,https://swapi.dev/api/people/83/


### 2. 

In [11]:
# Get planets data
planets = get_swapi_api("planets")
planets

1 of 6
2 of 6
3 of 6
4 of 6
5 of 6
6 of 6


Unnamed: 0,name,rotation_period,orbital_period,diameter,climate,gravity,terrain,surface_water,population,residents,films,created,edited,url
0,Tatooine,23,304,10465,arid,1 standard,desert,1,200000,"[https://swapi.dev/api/people/1/, https://swap...","[https://swapi.dev/api/films/1/, https://swapi...",2014-12-09T13:50:49.641000Z,2014-12-20T20:58:18.411000Z,https://swapi.dev/api/planets/1/
1,Alderaan,24,364,12500,temperate,1 standard,"grasslands, mountains",40,2000000000,"[https://swapi.dev/api/people/5/, https://swap...","[https://swapi.dev/api/films/1/, https://swapi...",2014-12-10T11:35:48.479000Z,2014-12-20T20:58:18.420000Z,https://swapi.dev/api/planets/2/
2,Yavin IV,24,4818,10200,"temperate, tropical",1 standard,"jungle, rainforests",8,1000,[],[https://swapi.dev/api/films/1/],2014-12-10T11:37:19.144000Z,2014-12-20T20:58:18.421000Z,https://swapi.dev/api/planets/3/
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,Shili,unknown,unknown,unknown,temperate,1,"cities, savannahs, seas, plains",unknown,unknown,[https://swapi.dev/api/people/78/],[],2014-12-20T18:43:14.049000Z,2014-12-20T20:58:18.521000Z,https://swapi.dev/api/planets/58/
58,Kalee,23,378,13850,"arid, temperate, tropical",1,"rainforests, cliffs, canyons, seas",unknown,4000000000,[https://swapi.dev/api/people/79/],[],2014-12-20T19:43:51.278000Z,2014-12-20T20:58:18.523000Z,https://swapi.dev/api/planets/59/
59,Umbara,unknown,unknown,unknown,unknown,unknown,unknown,unknown,unknown,[https://swapi.dev/api/people/82/],[],2014-12-20T20:18:36.256000Z,2014-12-20T20:58:18.525000Z,https://swapi.dev/api/planets/60/


### 3. 

In [12]:
# Extract the data for starships.
starships = get_swapi_api("starships")
starships

1 of 4
2 of 4
3 of 4
4 of 4


Unnamed: 0,name,model,manufacturer,cost_in_credits,length,max_atmosphering_speed,crew,passengers,cargo_capacity,consumables,hyperdrive_rating,MGLT,starship_class,pilots,films,created,edited,url
0,CR90 corvette,CR90 corvette,Corellian Engineering Corporation,3500000,150,950,30-165,600,3000000,1 year,2.0,60,corvette,[],"[https://swapi.dev/api/films/1/, https://swapi...",2014-12-10T14:20:33.369000Z,2014-12-20T21:23:49.867000Z,https://swapi.dev/api/starships/2/
1,Star Destroyer,Imperial I-class Star Destroyer,Kuat Drive Yards,150000000,1600,975,47060,,36000000,2 years,2.0,60,Star Destroyer,[],"[https://swapi.dev/api/films/1/, https://swapi...",2014-12-10T15:08:19.848000Z,2014-12-20T21:23:49.870000Z,https://swapi.dev/api/starships/3/
2,Sentinel-class landing craft,Sentinel-class landing craft,"Sienar Fleet Systems, Cyngus Spaceworks",240000,38,1000,5,75,180000,1 month,1.0,70,landing craft,[],[https://swapi.dev/api/films/1/],2014-12-10T15:48:00.586000Z,2014-12-20T21:23:49.873000Z,https://swapi.dev/api/starships/5/
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33,Banking clan frigte,Munificent-class star frigate,"Hoersch-Kessel Drive, Inc, Gwori Revolutionary...",57000000,825,unknown,200,unknown,40000000,2 years,1.0,unknown,cruiser,[],[https://swapi.dev/api/films/6/],2014-12-20T20:07:11.538000Z,2014-12-20T21:23:49.956000Z,https://swapi.dev/api/starships/68/
34,Belbullab-22 starfighter,Belbullab-22 starfighter,Feethan Ottraw Scalable Assemblies,168000,6.71,1100,1,0,140,7 days,6,unknown,starfighter,"[https://swapi.dev/api/people/10/, https://swa...",[https://swapi.dev/api/films/6/],2014-12-20T20:38:05.031000Z,2014-12-20T21:23:49.959000Z,https://swapi.dev/api/starships/74/
35,V-wing,Alpha-3 Nimbus-class V-wing starfighter,Kuat Systems Engineering,102500,7.9,1050,1,0,60,15 hours,1.0,unknown,starfighter,[],[https://swapi.dev/api/films/6/],2014-12-20T20:43:04.349000Z,2014-12-20T21:23:49.961000Z,https://swapi.dev/api/starships/75/


### 4. 

In [13]:
# Save the data in your files to local csv files so that it will be faster to access in the future.

# Make people into a CSV
people.to_csv("people.csv", index=False)

# Starships CSV
starships.to_csv("starships.csv", index=False)

# Planets CSV
planets.to_csv("planets.csv", index=False)

### 5. 

In [14]:
# Combine the data from your three separate dataframes into one large dataframe.

def get_star_wars_data(people, planets, starships):
    # Add planet_ to the beginning of each planet column and join people['homeworld'] with planets['url']
    planets = planets.add_prefix("planet_")
    starships = starships.add_prefix("starship_")

    # Remove brackets and use split to change people.starships from a python list
    people["starships"] = people["starships"].str.replace("[", "")
    people["starships"] = people["starships"].str.replace("]", "")
    people["starships"] = people["starships"].str.replace("'", "")
    people["starships"] = people["starships"].str.split(",")

    # Explode people.starships
    people = people.explode("starships")

    # Join. People column 'homeworld' matches planets column 'planet_url'.
    df = people.join(planets.set_index("planet_url"), on="homeworld")

    # Join. People column 'starship' matches starship column 'starship_url'.
    df = df.join(starships.set_index("starship_url"), on="starships")

    df = df[
        [
            "name",
            "height",
            "mass",
            "hair_color",
            "skin_color",
            "eye_color",
            "birth_year",
            "gender",
            "planet_name",
            "planet_climate",
            "planet_gravity",
            "planet_terrain",
            "planet_surface_water",
            "planet_population",
            "planet_created",
            "starship_name",
            "starship_model",
            "starship_manufacturer",
            "starship_cost_in_credits",
            "starship_length",
            "starship_max_atmosphering_speed",
            "starship_cargo_capacity",
            "starship_consumables",
            "starship_hyperdrive_rating",
            "starship_starship_class",
            "starship_created",
        ]
    ]
    return df

In [15]:
people = get_swapi_api("people")
planets = get_swapi_api("planets")
starships = get_swapi_api("starships")

df = get_star_wars_data(people, planets, starships)

CSV file found.
CSV file found.
CSV file found.


In [16]:
df

Unnamed: 0,name,height,mass,hair_color,skin_color,eye_color,birth_year,gender,planet_name,planet_climate,planet_gravity,planet_terrain,planet_surface_water,planet_population,planet_created,starship_name,starship_model,starship_manufacturer,starship_cost_in_credits,starship_length,starship_max_atmosphering_speed,starship_cargo_capacity,starship_consumables,starship_hyperdrive_rating,starship_starship_class,starship_created
0,Luke Skywalker,172,77,blond,fair,blue,19BBY,male,Tatooine,arid,1 standard,desert,1,200000,2014-12-09T13:50:49.641000Z,X-wing,T-65 X-wing,Incom Corporation,149999,12.5,1050,110,1 week,1.0,Starfighter,2014-12-12T11:19:05.340000Z
0,Luke Skywalker,172,77,blond,fair,blue,19BBY,male,Tatooine,arid,1 standard,desert,1,200000,2014-12-09T13:50:49.641000Z,,,,,,,,,,,
1,C-3PO,167,75,,gold,yellow,112BBY,,Tatooine,arid,1 standard,desert,1,200000,2014-12-09T13:50:49.641000Z,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,Raymus Antilles,188,79,brown,light,brown,unknown,male,Alderaan,temperate,1 standard,"grasslands, mountains",40,2000000000,2014-12-10T11:35:48.479000Z,,,,,,,,,,,
80,Sly Moore,178,48,none,pale,white,unknown,female,Umbara,unknown,unknown,unknown,unknown,unknown,2014-12-20T20:18:36.256000Z,,,,,,,,,,,
81,Tion Medon,206,80,none,grey,black,unknown,male,Utapau,"temperate, arid, windy",1 standard,"scrublands, savanna, canyons, sinkholes",0.9,95000000,2014-12-10T12:49:01.491000Z,,,,,,,,,,,


### 6.

In [None]:
# Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years.
# The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017.
# You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

# Read the data into a dataframe.
opsd = pd.read_csv(
    "https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv"
)


### 7. 


In [None]:
# Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the
# # acquire.py file and be able to re-run the functions and get the same data.

### All of these functions have been put into a acquire.py

#### Function for pulling swapi API

In [None]:
def get_swapi_api(endpoint):
    # If endpoint csv exists, use it
    if os.path.isfile(f"{endpoint}.csv"):
        return pd.read_csv(f"{endpoint}.csv")
    else:
        # Find how many pages the api has
        response = requests.get(f"https://swapi.dev/api/{endpoint}/")
        data = response.json()
        # Assume data is a dictionary containing the results and count
        num_results = len(data["results"])
        num_pages = math.ceil(data["count"] / num_results)
        # Create an empty dataframe
        endpoint_df = pd.DataFrame()
        # Create a loop to get all pages
        for i in range(1, (num_pages + 1)):
            # Get the data from the API
            response = requests.get(f"https://swapi.dev/api/{endpoint}/?page=" + str(i))
            # Convert the data to json
            data = response.json()
            # Convert the json to a dataframe
            df = pd.DataFrame(data["results"])
            # Concat the dataframes
            endpoint_df = pd.concat([endpoint_df, df])
            # Reset the index of the df
            endpoint_df = endpoint_df.reset_index(drop=True)
        return endpoint_df

#### Function for reading in multiple CSVs

In [None]:
# Create a function to read in a list of CSV names, and return them as a dataframe
def read_csvs(csv_list):
    # Create an empty dataframe
    df = pd.DataFrame()
    # Create a loop to read in each csv
    for csv in csv_list:
        # Read in the csv
        df_csv = pd.read_csv(csv)
        # Concatenate the csv to the dataframe
        df = pd.concat([df, df_csv])
    # Reset the index
    df = df.reset_index(drop=True)
    return df

In [None]:
list = ["people.csv"]

people = read_csvs(list)

In [None]:
csvs = ["people.csv", "starships.csv", "planets.csv"]

df = read_csvs(csvs)

In [None]:
df

Unnamed: 0,name,height,mass,hair_color,skin_color,eye_color,birth_year,gender,homeworld,films,species,vehicles,starships,created,edited,...,cargo_capacity,consumables,hyperdrive_rating,MGLT,starship_class,pilots,rotation_period,orbital_period,diameter,climate,gravity,terrain,surface_water,population,residents
0,Luke Skywalker,172,77,blond,fair,blue,19BBY,male,Tatooine,"['https://swapi.dev/api/films/1/', 'https://sw...",[],"['https://swapi.dev/api/vehicles/14/', 'https:...","['https://swapi.dev/api/starships/12/', 'https...",2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,...,,,,,,,,,,,,,,,
1,C-3PO,167,75,,gold,yellow,112BBY,,Tatooine,"['https://swapi.dev/api/films/1/', 'https://sw...",['https://swapi.dev/api/species/2/'],[],[],2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,...,,,,,,,,,,,,,,,
2,R2-D2,96,32,,"white, blue",red,33BBY,,Naboo,"['https://swapi.dev/api/films/1/', 'https://sw...",['https://swapi.dev/api/species/2/'],[],[],2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,...,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,Shili,,,,,,,,,[],,,,2014-12-20T18:43:14.049000Z,2014-12-20T20:58:18.521000Z,...,,,,,,,unknown,unknown,unknown,temperate,1,"cities, savannahs, seas, plains",unknown,unknown,['https://swapi.dev/api/people/78/']
58,Kalee,,,,,,,,,[],,,,2014-12-20T19:43:51.278000Z,2014-12-20T20:58:18.523000Z,...,,,,,,,23,378,13850,"arid, temperate, tropical",1,"rainforests, cliffs, canyons, seas",unknown,4000000000,['https://swapi.dev/api/people/79/']
59,Umbara,,,,,,,,,[],,,,2014-12-20T20:18:36.256000Z,2014-12-20T20:58:18.525000Z,...,,,,,,,unknown,unknown,unknown,unknown,unknown,unknown,unknown,unknown,['https://swapi.dev/api/people/82/']


In [None]:
def concat_dfs(df_list, axis=0):
    df = pd.concat(df_list, axis=axis)
    return df

#### Function for pulling Open Power Systems Data (OPSD)

In [None]:
def get_opsd():
    opsd = pd.read_csv(
        "https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv"
    )
    return opsd

## Random Practice

#### Test to fill colums that have a single API link

#### Pandas .explode

In [168]:
def get_star_wars_data(people, planets, starships):
    # Add planet_ to the beginning of each planet column and join people['homeworld'] with planets['url']
    planets = planets.add_prefix("planet_")
    starships = starships.add_prefix("starship_")

    # Remove brackets and use split to change people.starships from a python list
    people["starships"] = people["starships"].str.replace("[", "")
    people["starships"] = people["starships"].str.replace("]", "")
    people["starships"] = people["starships"].str.replace("'", "")
    people["starships"] = people["starships"].str.split(",")

    # Explode people.starships
    people = people.explode("starships")

    # Join. People column 'homeworld' matches planets column 'planet_url'.
    df = people.join(planets.set_index("planet_url"), on="homeworld")

    # Join. People column 'starship' matches starship column 'starship_url'.
    df = df.join(starships.set_index("starship_url"), on="starships")

    df = df[
        [
            "name",
            "height",
            "mass",
            "hair_color",
            "skin_color",
            "eye_color",
            "birth_year",
            "gender",
            "planet_name",
            "planet_climate",
            "planet_gravity",
            "planet_terrain",
            "planet_surface_water",
            "planet_population",
            "planet_created",
            "starship_name",
            "starship_model",
            "starship_manufacturer",
            "starship_cost_in_credits",
            "starship_length",
            "starship_max_atmosphering_speed",
            "starship_cargo_capacity",
            "starship_consumables",
            "starship_hyperdrive_rating",
            "starship_starship_class",
            "starship_created",
        ]
    ]

    return df

## Notes

### Time Series Overview Notes

<u>*Look into Facebook prophet model for interest in Time Series Analysis (one of, if not the, greatest seasonal model)*

*Also SAREMA*

- Acquisition
    - JSON is like a list of dictionaries

- Prepare
    - deals with a lot more dates

- Explore
     - Special split for time series
     - Essentially time splits, but situational
     - Seasonality or not determines how to split
     
- Visuals
     - https://chart.guide/charts/time-or-trend/



### Time Series

#### REST

Representational State Transfer (REST) is a software architectural style that defines a set of constraints to be used for creating web services. RESTful web services allow clients to access and manipulate web resources using a uniform and predefined set of stateless operations.

#### API

An Application Programming Interface (API) is a set of protocols, routines, and tools for building software applications. In the context of time series analysis, APIs can be used to access and manipulate time series data from various sources.

#### JSON

JavaScript Object Notation (JSON) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. JSON is commonly used for transmitting data between a server and a web application, and can be used to represent time series data.

#### URI

Uniform Resource Identifier (URI) is a string of characters that identifies a name or a resource on the Internet. In the context of time series analysis, URIs can be used to identify and access time series data from various sources.

#### CRUD

CRUD stands for Create, Read, Update, and Delete, which are the four basic functions of persistent storage. In the context of time series analysis, CRUD operations can be used to create, read, update, and delete time series data from a database or other storage system.

#### Temporal

Temporal refers to the dimension of time in data. In the context of time series analysis, temporal data refers to data that is collected over time, such as stock prices, weather data, or sensor readings.

#### Periodic

Periodic refers to data that exhibits a repeating pattern over time. In the context of time series analysis, periodic data can be modeled using techniques such as Fourier analysis or seasonal decomposition.

#### Resampling in Time Series

Resampling in time series refers to the process of changing the frequency of a time series. This can involve upsampling (increasing the frequency of the data) or downsampling (decreasing the frequency of the data).

#### Stationary Process

A stationary process is a stochastic process whose statistical properties (such as mean and variance) do not change over time. In the context of time series analysis, stationary processes are often assumed in order to simplify modeling and analysis.

#### Trend

Trend refers to the long-term pattern or direction of a time series. In the context of time series analysis, trends can be modeled using techniques such as linear regression or exponential smoothing.

#### Seasonality

Seasonality refers to the repeating pattern of a time series over a fixed period of time (such as a day, week, or year). In the context of time series analysis, seasonality can be modeled using techniques such as seasonal decomposition or Fourier analysis.

#### Heteroskedasticity

Heteroskedasticity refers to the phenomenon where the variance of a time series changes over time. In the context of time series analysis, heteroskedasticity can complicate modeling and analysis, and may require techniques such as generalized autoregressive conditional heteroskedasticity (GARCH) models.

#### Autocorrelation

Autocorrelation refers to the correlation between a time series and a lagged version of itself. In the context of time series analysis, autocorrelation can be used to identify patterns and dependencies in the data, and can be modeled using techniques such as autoregressive integrated moving average (ARIMA) models.