# Obtaining Data and Cleaning Data

Pandas is a great library for working with data, there are two things you'll need to know how to do: 

- How to obtain data fraom an API
- How to clean data 

Throughout this notebook, we'll look at pulling data from an API (application programming interface), formatting it, analyzing it, and cleaning it! First let's import the required packages that we'll be needing for our first calls: 

In [1]:
import numpy as np
import pandas as pd

from pandas import DataFrame
from pandas import Series
import requests

We've worked with numpy, pandas, Series, and DataFrame before. What's new in this notebook so far is the `requests` package. If you do not have the requests library already installed, execute the following command: 

```bash
pip install requests
```

Requests is how can make an HTTP request via our python code. This works exactly as web browsers do, but instead of using a web browser to call a website that returns a GUI, we'll be hitting a web service that returns only data. The API that we'll be using is a Star Wars API to learn some things about the star wars movies! If you want to check out the visual side of the site, check out https://swapi.co, where you can explore all the potential api calls for SWAPI. For now, let's call the people endpoint to get some character data. 

In [9]:
response = requests.get("https://swapi.dev/api/people/")
response

<Response [200]>

Now that we have our response data, we need to get it into a useable format. Luckily, pandas is really good at reading lots of formats, and can easily take in JSON format). If you're interested in learning more about JSON, check out their [webpage]('https://www.json.org').  

In [None]:
jsonResponse = response.json()
jsonResponse

{'count': 87,
 'next': 'https://swapi.co/api/people/?page=2',
 'previous': None,
 'results': [{'name': 'Luke Skywalker',
   'height': '172',
   'mass': '77',
   'hair_color': 'blond',
   'skin_color': 'fair',
   'eye_color': 'blue',
   'birth_year': '19BBY',
   'gender': 'male',
   'homeworld': 'https://swapi.co/api/planets/1/',
   'films': ['https://swapi.co/api/films/2/',
    'https://swapi.co/api/films/6/',
    'https://swapi.co/api/films/3/',
    'https://swapi.co/api/films/1/',
    'https://swapi.co/api/films/7/'],
   'species': ['https://swapi.co/api/species/1/'],
   'vehicles': ['https://swapi.co/api/vehicles/14/',
    'https://swapi.co/api/vehicles/30/'],
   'starships': ['https://swapi.co/api/starships/12/',
    'https://swapi.co/api/starships/22/'],
   'created': '2014-12-09T13:50:51.644000Z',
   'edited': '2014-12-20T21:17:56.891000Z',
   'url': 'https://swapi.co/api/people/1/'},
  {'name': 'C-3PO',
   'height': '167',
   'mass': '75',
   'hair_color': 'n/a',
   'skin_color'

In python, JSON objects can be treated as dictionaries, and with that, we can access data from a JSON object with a 'key'. So in the jsonResponse, we can get the results by typing `results` as the key for the value that object stores: 

In [5]:
jsonBody = jsonResponse['results']
jsonBody

[{'name': 'Luke Skywalker',
  'height': '172',
  'mass': '77',
  'hair_color': 'blond',
  'skin_color': 'fair',
  'eye_color': 'blue',
  'birth_year': '19BBY',
  'gender': 'male',
  'homeworld': 'https://swapi.co/api/planets/1/',
  'films': ['https://swapi.co/api/films/2/',
   'https://swapi.co/api/films/6/',
   'https://swapi.co/api/films/3/',
   'https://swapi.co/api/films/1/',
   'https://swapi.co/api/films/7/'],
  'species': ['https://swapi.co/api/species/1/'],
  'vehicles': ['https://swapi.co/api/vehicles/14/',
   'https://swapi.co/api/vehicles/30/'],
  'starships': ['https://swapi.co/api/starships/12/',
   'https://swapi.co/api/starships/22/'],
  'created': '2014-12-09T13:50:51.644000Z',
  'edited': '2014-12-20T21:17:56.891000Z',
  'url': 'https://swapi.co/api/people/1/'},
 {'name': 'C-3PO',
  'height': '167',
  'mass': '75',
  'hair_color': 'n/a',
  'skin_color': 'gold',
  'eye_color': 'yellow',
  'birth_year': '112BBY',
  'gender': 'n/a',
  'homeworld': 'https://swapi.co/api/pl

The data are in JSON format, but luckily we can pass JSON/Dictionary data into a DataFrame constuctor and it will be automatically converted to a dataframe: 

In [6]:
starwarsDF = DataFrame(jsonBody)

If we had wanted to save the reseponse into a file and read it back into the dataframe from the file system, we'd want to use `pd.read_json`. 

Let's start dipping into our data and see what we got with the `head` and `tail` functions.  

In [7]:
starwarsDF.head()

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]


In [8]:
starwarsDF.tail()

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
5,52BBY,2014-12-10T15:52:14.024000Z,2014-12-20T21:17:50.317000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,"brown, grey",178,https://swapi.co/api/planets/1/,120,Owen Lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,47BBY,2014-12-10T15:53:41.121000Z,2014-12-20T21:17:50.319000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",female,brown,165,https://swapi.co/api/planets/1/,75,Beru Whitesun lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,unknown,2014-12-10T15:57:50.959000Z,2014-12-20T21:17:50.321000Z,red,[https://swapi.co/api/films/1/],,,97,https://swapi.co/api/planets/1/,32,R5-D4,"white, red",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,24BBY,2014-12-10T15:59:50.509000Z,2014-12-20T21:17:50.323000Z,brown,[https://swapi.co/api/films/1/],male,black,183,https://swapi.co/api/planets/1/,84,Biggs Darklighter,light,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,57BBY,2014-12-10T16:16:29.192000Z,2014-12-20T21:17:50.325000Z,blue-gray,"[https://swapi.co/api/films/2/, https://swapi....",male,"auburn, white",182,https://swapi.co/api/planets/20/,77,Obi-Wan Kenobi,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


Curiously, our `head` and our `tail` functions seemingly are only returning indices `0-4` and `5-9` respectively. Let's look at the whole dataframe: 

In [9]:
starwarsDF

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]
5,52BBY,2014-12-10T15:52:14.024000Z,2014-12-20T21:17:50.317000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,"brown, grey",178,https://swapi.co/api/planets/1/,120,Owen Lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,47BBY,2014-12-10T15:53:41.121000Z,2014-12-20T21:17:50.319000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",female,brown,165,https://swapi.co/api/planets/1/,75,Beru Whitesun lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,unknown,2014-12-10T15:57:50.959000Z,2014-12-20T21:17:50.321000Z,red,[https://swapi.co/api/films/1/],,,97,https://swapi.co/api/planets/1/,32,R5-D4,"white, red",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,24BBY,2014-12-10T15:59:50.509000Z,2014-12-20T21:17:50.323000Z,brown,[https://swapi.co/api/films/1/],male,black,183,https://swapi.co/api/planets/1/,84,Biggs Darklighter,light,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,57BBY,2014-12-10T16:16:29.192000Z,2014-12-20T21:17:50.325000Z,blue-gray,"[https://swapi.co/api/films/2/, https://swapi....",male,"auburn, white",182,https://swapi.co/api/planets/20/,77,Obi-Wan Kenobi,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


To make sure we're not missing something, let's just double check the entire length: 

In [10]:
len(starwarsDF)

10

Clearly we missed something. Let's look back at our json data to see what we missed: 

In [11]:
jsonResponse

{'count': 87,
 'next': 'https://swapi.co/api/people/?page=2',
 'previous': None,
 'results': [{'name': 'Luke Skywalker',
   'height': '172',
   'mass': '77',
   'hair_color': 'blond',
   'skin_color': 'fair',
   'eye_color': 'blue',
   'birth_year': '19BBY',
   'gender': 'male',
   'homeworld': 'https://swapi.co/api/planets/1/',
   'films': ['https://swapi.co/api/films/2/',
    'https://swapi.co/api/films/6/',
    'https://swapi.co/api/films/3/',
    'https://swapi.co/api/films/1/',
    'https://swapi.co/api/films/7/'],
   'species': ['https://swapi.co/api/species/1/'],
   'vehicles': ['https://swapi.co/api/vehicles/14/',
    'https://swapi.co/api/vehicles/30/'],
   'starships': ['https://swapi.co/api/starships/12/',
    'https://swapi.co/api/starships/22/'],
   'created': '2014-12-09T13:50:51.644000Z',
   'edited': '2014-12-20T21:17:56.891000Z',
   'url': 'https://swapi.co/api/people/1/'},
  {'name': 'C-3PO',
   'height': '167',
   'mass': '75',
   'hair_color': 'n/a',
   'skin_color'

Upon further inspection, we see that there's a `next` and a `previous` datapoint. Let's see what happens if we call that `next` url: 

In [12]:
secondCall = requests.get(jsonResponse['next'])

In [13]:
secondCall.json()

{'count': 87,
 'next': 'https://swapi.co/api/people/?page=3',
 'previous': 'https://swapi.co/api/people/?page=1',
 'results': [{'name': 'Anakin Skywalker',
   'height': '188',
   'mass': '84',
   'hair_color': 'blond',
   'skin_color': 'fair',
   'eye_color': 'blue',
   'birth_year': '41.9BBY',
   'gender': 'male',
   'homeworld': 'https://swapi.co/api/planets/1/',
   'films': ['https://swapi.co/api/films/5/',
    'https://swapi.co/api/films/4/',
    'https://swapi.co/api/films/6/'],
   'species': ['https://swapi.co/api/species/1/'],
   'vehicles': ['https://swapi.co/api/vehicles/44/',
    'https://swapi.co/api/vehicles/46/'],
   'starships': ['https://swapi.co/api/starships/59/',
    'https://swapi.co/api/starships/65/',
    'https://swapi.co/api/starships/39/'],
   'created': '2014-12-10T16:20:44.310000Z',
   'edited': '2014-12-20T21:17:50.327000Z',
   'url': 'https://swapi.co/api/people/11/'},
  {'name': 'Wilhuff Tarkin',
   'height': '180',
   'mass': 'unknown',
   'hair_color': 'a

In [14]:
secondJsonBody = secondCall.json()

In [15]:
secondDF = DataFrame(secondJsonBody['results'])
secondDF

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,41.9BBY,2014-12-10T16:20:44.310000Z,2014-12-20T21:17:50.327000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,blond,188,https://swapi.co/api/planets/1/,84,Anakin Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/59/, https://s...",https://swapi.co/api/people/11/,"[https://swapi.co/api/vehicles/44/, https://sw..."
1,64BBY,2014-12-10T16:26:56.138000Z,2014-12-20T21:17:50.330000Z,blue,"[https://swapi.co/api/films/6/, https://swapi....",male,"auburn, grey",180,https://swapi.co/api/planets/21/,unknown,Wilhuff Tarkin,fair,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/12/,[]
2,200BBY,2014-12-10T16:42:45.066000Z,2014-12-20T21:17:50.332000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,brown,228,https://swapi.co/api/planets/14/,112,Chewbacca,unknown,[https://swapi.co/api/species/3/],"[https://swapi.co/api/starships/10/, https://s...",https://swapi.co/api/people/13/,[https://swapi.co/api/vehicles/19/]
3,29BBY,2014-12-10T16:49:14.582000Z,2014-12-20T21:17:50.334000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",male,brown,180,https://swapi.co/api/planets/22/,80,Han Solo,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/10/, https://s...",https://swapi.co/api/people/14/,[]
4,44BBY,2014-12-10T17:03:30.334000Z,2014-12-20T21:17:50.336000Z,black,[https://swapi.co/api/films/1/],male,,173,https://swapi.co/api/planets/23/,74,Greedo,green,[https://swapi.co/api/species/4/],[],https://swapi.co/api/people/15/,[]
5,600BBY,2014-12-10T17:11:31.638000Z,2014-12-20T21:17:50.338000Z,orange,"[https://swapi.co/api/films/4/, https://swapi....",hermaphrodite,,175,https://swapi.co/api/planets/24/,1358,Jabba Desilijic Tiure,"green-tan, brown",[https://swapi.co/api/species/5/],[],https://swapi.co/api/people/16/,[]
6,21BBY,2014-12-12T11:08:06.469000Z,2014-12-20T21:17:50.341000Z,hazel,"[https://swapi.co/api/films/2/, https://swapi....",male,brown,170,https://swapi.co/api/planets/22/,77,Wedge Antilles,fair,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/18/,[https://swapi.co/api/vehicles/14/]
7,unknown,2014-12-12T11:16:56.569000Z,2014-12-20T21:17:50.343000Z,blue,[https://swapi.co/api/films/1/],male,brown,180,https://swapi.co/api/planets/26/,110,Jek Tono Porkins,fair,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/19/,[]
8,896BBY,2014-12-15T12:26:01.042000Z,2014-12-20T21:17:50.345000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",male,white,66,https://swapi.co/api/planets/28/,17,Yoda,green,[https://swapi.co/api/species/6/],[],https://swapi.co/api/people/20/,[]
9,82BBY,2014-12-15T12:48:05.971000Z,2014-12-20T21:17:50.347000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,grey,170,https://swapi.co/api/planets/8/,75,Palpatine,pale,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/21/,[]


It looks like that second set of data from the next call had even more characters! Also, it looks like our second call has a `next` as well. Before we go on and try to call every `next` we possibly can, let's add our two dataframes together into one single dataframe. 

Luckily for us, there is a very simple method in the pandas library where we can concatenate multiple dataframes together, so long as the dataframes exist within a list: 

In [16]:
dataFrameList = [ starwarsDF, secondDF ]
superFrame = pd.concat(dataFrameList)
superFrame

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]
5,52BBY,2014-12-10T15:52:14.024000Z,2014-12-20T21:17:50.317000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,"brown, grey",178,https://swapi.co/api/planets/1/,120,Owen Lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,47BBY,2014-12-10T15:53:41.121000Z,2014-12-20T21:17:50.319000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",female,brown,165,https://swapi.co/api/planets/1/,75,Beru Whitesun lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,unknown,2014-12-10T15:57:50.959000Z,2014-12-20T21:17:50.321000Z,red,[https://swapi.co/api/films/1/],,,97,https://swapi.co/api/planets/1/,32,R5-D4,"white, red",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,24BBY,2014-12-10T15:59:50.509000Z,2014-12-20T21:17:50.323000Z,brown,[https://swapi.co/api/films/1/],male,black,183,https://swapi.co/api/planets/1/,84,Biggs Darklighter,light,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,57BBY,2014-12-10T16:16:29.192000Z,2014-12-20T21:17:50.325000Z,blue-gray,"[https://swapi.co/api/films/2/, https://swapi....",male,"auburn, white",182,https://swapi.co/api/planets/20/,77,Obi-Wan Kenobi,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


Now we've got all of our data in a list, one thing to notice is that our indices are a little messed up. We've got multiple 0-9s. 

What we can do, though, is reindex our list with `reset_index!`

In [17]:
superFrame.reset_index()

Unnamed: 0,index,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]
5,5,52BBY,2014-12-10T15:52:14.024000Z,2014-12-20T21:17:50.317000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,"brown, grey",178,https://swapi.co/api/planets/1/,120,Owen Lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,6,47BBY,2014-12-10T15:53:41.121000Z,2014-12-20T21:17:50.319000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",female,brown,165,https://swapi.co/api/planets/1/,75,Beru Whitesun lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,7,unknown,2014-12-10T15:57:50.959000Z,2014-12-20T21:17:50.321000Z,red,[https://swapi.co/api/films/1/],,,97,https://swapi.co/api/planets/1/,32,R5-D4,"white, red",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,8,24BBY,2014-12-10T15:59:50.509000Z,2014-12-20T21:17:50.323000Z,brown,[https://swapi.co/api/films/1/],male,black,183,https://swapi.co/api/planets/1/,84,Biggs Darklighter,light,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,9,57BBY,2014-12-10T16:16:29.192000Z,2014-12-20T21:17:50.325000Z,blue-gray,"[https://swapi.co/api/films/2/, https://swapi....",male,"auburn, white",182,https://swapi.co/api/planets/20/,77,Obi-Wan Kenobi,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


Notice that we accidentally created a brand new column? We can remove that column by writing in `drop=True` as a parameter. 

Also, remember from before that many pandas functions do not modify the dataframe that you're actually working with, but instead return a new value (i.e. the value of the manipulated dataframe). So we'll want to save this new value to another variable. 

In [18]:
resetIndexList = superFrame.reset_index(drop=True)
resetIndexList

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]
5,52BBY,2014-12-10T15:52:14.024000Z,2014-12-20T21:17:50.317000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,"brown, grey",178,https://swapi.co/api/planets/1/,120,Owen Lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,47BBY,2014-12-10T15:53:41.121000Z,2014-12-20T21:17:50.319000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",female,brown,165,https://swapi.co/api/planets/1/,75,Beru Whitesun lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,unknown,2014-12-10T15:57:50.959000Z,2014-12-20T21:17:50.321000Z,red,[https://swapi.co/api/films/1/],,,97,https://swapi.co/api/planets/1/,32,R5-D4,"white, red",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,24BBY,2014-12-10T15:59:50.509000Z,2014-12-20T21:17:50.323000Z,brown,[https://swapi.co/api/films/1/],male,black,183,https://swapi.co/api/planets/1/,84,Biggs Darklighter,light,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,57BBY,2014-12-10T16:16:29.192000Z,2014-12-20T21:17:50.325000Z,blue-gray,"[https://swapi.co/api/films/2/, https://swapi....",male,"auburn, white",182,https://swapi.co/api/planets/20/,77,Obi-Wan Kenobi,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


So, now that we know how to call our API and concatenate dataframes, let's create a function to do this for us to get all of the data (we could step through and call each `next` ourselves, but why take the time to do that and clutter our code up when we can write a very quick function to do it for us?):  

In [19]:
def makeCallandReturnWholeList(): 
    nextURL = "https://swapi.co/api/people/"
    dataframelist = list()
    
   
    while(nextURL):                        ## as long as the next URL exists, make the call! 
        response = requests.get(nextURL)   ## call API
        jsonResponse = response.json()     ## convert to json
        
        nextURL = jsonResponse['next']     ## get next url
        jsonBody = jsonResponse['results'] ## get results
        
        dataFrame = DataFrame(jsonBody)    ## convert to df
        dataframelist.append(dataFrame)    ## append to list
        
    wholeDF = pd.concat(dataframelist)     ## concat whole list to make super dataframe
    resetIndexDF = wholeDF.reset_index(drop=True)  ## reset the indices and remove the old ones! 
    
    return resetIndexDF                    ## return the single big dataframe

In [20]:
allCharacters = makeCallandReturnWholeList()
allCharacters

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]
5,52BBY,2014-12-10T15:52:14.024000Z,2014-12-20T21:17:50.317000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",male,"brown, grey",178,https://swapi.co/api/planets/1/,120,Owen Lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,47BBY,2014-12-10T15:53:41.121000Z,2014-12-20T21:17:50.319000Z,blue,"[https://swapi.co/api/films/5/, https://swapi....",female,brown,165,https://swapi.co/api/planets/1/,75,Beru Whitesun lars,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,unknown,2014-12-10T15:57:50.959000Z,2014-12-20T21:17:50.321000Z,red,[https://swapi.co/api/films/1/],,,97,https://swapi.co/api/planets/1/,32,R5-D4,"white, red",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,24BBY,2014-12-10T15:59:50.509000Z,2014-12-20T21:17:50.323000Z,brown,[https://swapi.co/api/films/1/],male,black,183,https://swapi.co/api/planets/1/,84,Biggs Darklighter,light,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,57BBY,2014-12-10T16:16:29.192000Z,2014-12-20T21:17:50.325000Z,blue-gray,"[https://swapi.co/api/films/2/, https://swapi....",male,"auburn, white",182,https://swapi.co/api/planets/20/,77,Obi-Wan Kenobi,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


Now that we've got a whole list of characters, let's see how long it is: 

In [21]:
len(allCharacters)

87

As a reminder from the previous lesson, you can grab any number of rows from a dataframe by creating a boolean series such as: 

In [22]:
allCharacters[allCharacters['birth_year'] == "19BBY"]

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]


Supposing that you wanted to work on an individual row, instead, you can grab individual elements by their indices with the `iloc` method:

In [23]:
firstRow = allCharacters.iloc[0]
firstRow

birth_year                                                19BBY
created                             2014-12-09T13:50:51.644000Z
edited                              2014-12-20T21:17:56.891000Z
eye_color                                                  blue
films         [https://swapi.co/api/films/2/, https://swapi....
gender                                                     male
hair_color                                                blond
height                                                      172
homeworld                       https://swapi.co/api/planets/1/
mass                                                         77
name                                             Luke Skywalker
skin_color                                                 fair
species                       [https://swapi.co/api/species/1/]
starships     [https://swapi.co/api/starships/12/, https://s...
url                              https://swapi.co/api/people/1/
vehicles      [https://swapi.co/api/vehi

Now we can see what each individual row looks like. As another reminder, all rows (as well as columns) in pandas are treated as a series. Think of it like a very fancy multidimensional array. 

In [24]:
type(firstRow)

pandas.core.series.Series

You can access a Series' data by using either the dictionary notation (i.e. `df/series["someKey"]`), but you can also access the keyed values with a dot property: 

In [25]:
firstRow.homeworld

'https://swapi.co/api/planets/1/'

In [26]:
firstRow.starships

['https://swapi.co/api/starships/12/', 'https://swapi.co/api/starships/22/']

### URL Encoded data? 

Now, let's take a gander at that URL encoded data. We can clearly see that the data look like another API call. It's not always the case that the data we receive from an API (or really anywhere for that matter) will be perfect. Sometimes you may find yourself needing to make additional API calls to fill your data out. 


We could loop over our URIs and append new data to our list like the following code: 

```python
for x in firstRow.starships: 
    print(x)
```

While this method is fine, we can do better. By using the map function, we can more concisely apply a specific funtion to our list data: 

In [27]:
def addStringtoURL(x): 
    return "URL NAME STRING + " + x

In [28]:
stuff = map(addStringtoURL, firstRow.starships)
stuffList = list(stuff)
print(stuffList)

['URL NAME STRING + https://swapi.co/api/starships/12/', 'URL NAME STRING + https://swapi.co/api/starships/22/']


Depending on how short our function is, we could apply a lambda to our function so we don't have to write out an entirely new function. That being said, I'm personally of the opinion that spelling everything out is better than not. However, knowing that you can apply lambdas could help out for codensing code: 

In [29]:
lamnbdaStuff = map(lambda x: "URL NAME STRING + " + x, firstRow.starships)
list(lamnbdaStuff)

['URL NAME STRING + https://swapi.co/api/starships/12/',
 'URL NAME STRING + https://swapi.co/api/starships/22/']

To find out more about map functionality (as well as some other useful methods) check out the documentation.


Let's get back to our code and see what's in these requests now: 

In [30]:
response = requests.get(firstRow.homeworld)
response.json()

{'name': 'Tatooine',
 'rotation_period': '23',
 'orbital_period': '304',
 'diameter': '10465',
 'climate': 'arid',
 'gravity': '1 standard',
 'terrain': 'desert',
 'surface_water': '1',
 'population': '200000',
 'residents': ['https://swapi.co/api/people/1/',
  'https://swapi.co/api/people/2/',
  'https://swapi.co/api/people/4/',
  'https://swapi.co/api/people/6/',
  'https://swapi.co/api/people/7/',
  'https://swapi.co/api/people/8/',
  'https://swapi.co/api/people/9/',
  'https://swapi.co/api/people/11/',
  'https://swapi.co/api/people/43/',
  'https://swapi.co/api/people/62/'],
 'films': ['https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/'],
 'created': '2014-12-09T13:50:49.641000Z',
 'edited': '2014-12-21T20:48:04.175778Z',
 'url': 'https://swapi.co/api/planets/1/'}

Instead of constantly writing: 

```python
response = requests.get(firstRow.homeworld)
response.json()
```

Let's create a function that will return the json object for us:

In [31]:
def getJsonFromRequests(URL): 
    response = requests.get(URL)
    return response.json()

Now let's test it out on our first homeworld:

In [32]:
getJsonFromRequests(firstRow.homeworld)

{'name': 'Tatooine',
 'rotation_period': '23',
 'orbital_period': '304',
 'diameter': '10465',
 'climate': 'arid',
 'gravity': '1 standard',
 'terrain': 'desert',
 'surface_water': '1',
 'population': '200000',
 'residents': ['https://swapi.co/api/people/1/',
  'https://swapi.co/api/people/2/',
  'https://swapi.co/api/people/4/',
  'https://swapi.co/api/people/6/',
  'https://swapi.co/api/people/7/',
  'https://swapi.co/api/people/8/',
  'https://swapi.co/api/people/9/',
  'https://swapi.co/api/people/11/',
  'https://swapi.co/api/people/43/',
  'https://swapi.co/api/people/62/'],
 'films': ['https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/'],
 'created': '2014-12-09T13:50:49.641000Z',
 'edited': '2014-12-21T20:48:04.175778Z',
 'url': 'https://swapi.co/api/planets/1/'}

Now that we have the planet data. We can look through it to see what we want. Personally, I think that the only data we need from this is the planet name. We could, if we wanted, pull as much of this as we want, but for now, let's just stick to the planet name. 

Let's try working with our arrays with our map function:

In [33]:
responseMap =  map(getJsonFromRequests, firstRow.starships)
responseList = list(responseMap)
responseList

[{'name': 'X-wing',
  'model': 'T-65 X-wing',
  'manufacturer': 'Incom Corporation',
  'cost_in_credits': '149999',
  'length': '12.5',
  'max_atmosphering_speed': '1050',
  'crew': '1',
  'passengers': '0',
  'cargo_capacity': '110',
  'consumables': '1 week',
  'hyperdrive_rating': '1.0',
  'MGLT': '100',
  'starship_class': 'Starfighter',
  'pilots': ['https://swapi.co/api/people/1/',
   'https://swapi.co/api/people/9/',
   'https://swapi.co/api/people/18/',
   'https://swapi.co/api/people/19/'],
  'films': ['https://swapi.co/api/films/2/',
   'https://swapi.co/api/films/3/',
   'https://swapi.co/api/films/1/'],
  'created': '2014-12-12T11:19:05.340000Z',
  'edited': '2014-12-22T17:35:44.491233Z',
  'url': 'https://swapi.co/api/starships/12/'},
 {'name': 'Imperial shuttle',
  'model': 'Lambda-class T-4a shuttle',
  'manufacturer': 'Sienar Fleet Systems',
  'cost_in_credits': '240000',
  'length': '20',
  'max_atmosphering_speed': '850',
  'crew': '6',
  'passengers': '20',
  'cargo_

Not the best formatted response, however, we got a list of responses back. What we can take note from, particularly is that both our `homeworld` and our `starship` return objects have a name. Let's do a quick check for each of the others, that is `films`,  `species`, and `vehicles` (but because we're lazy, let's write a function). But first, let's see how to determine what's a list and what's not: 

In [34]:
def hasNameKey(columnData):
    if type(columnData) is list: 
        print("list")
    else: 
        print("not list")

In [35]:
hasNameKey(firstRow.films)

list


In [36]:
hasNameKey(firstRow.homeworld)

not list


Now let's figure out how to make sure about each row having the proper key that we want by using the `getJsonFromRequest` function we made earlier, and checking if the dictionary has the key by using the `.get` functionality that comes with dictionaries:

In [37]:
def hasNameKey(columnData, keyName='name'): 
    if type(columnData) is list: 
        jsonFromRequestMap = map(getJsonFromRequests, columnData)
        responseList = list(jsonFromRequestMap)
        print(responseList)
        if responseList[0].get(keyName) != None:
            return True
        return False
    else: 
        jsonFromRequest = getJsonFromRequests(columnData)
        print(jsonFromRequest)
        if jsonFromRequest.get(keyName) != None: 
            return True
        return False

In [38]:
hasNameKey(firstRow.starships)

[{'name': 'X-wing', 'model': 'T-65 X-wing', 'manufacturer': 'Incom Corporation', 'cost_in_credits': '149999', 'length': '12.5', 'max_atmosphering_speed': '1050', 'crew': '1', 'passengers': '0', 'cargo_capacity': '110', 'consumables': '1 week', 'hyperdrive_rating': '1.0', 'MGLT': '100', 'starship_class': 'Starfighter', 'pilots': ['https://swapi.co/api/people/1/', 'https://swapi.co/api/people/9/', 'https://swapi.co/api/people/18/', 'https://swapi.co/api/people/19/'], 'films': ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/'], 'created': '2014-12-12T11:19:05.340000Z', 'edited': '2014-12-22T17:35:44.491233Z', 'url': 'https://swapi.co/api/starships/12/'}, {'name': 'Imperial shuttle', 'model': 'Lambda-class T-4a shuttle', 'manufacturer': 'Sienar Fleet Systems', 'cost_in_credits': '240000', 'length': '20', 'max_atmosphering_speed': '850', 'crew': '6', 'passengers': '20', 'cargo_capacity': '80000', 'consumables': '2 months', 'hyperdrive_rating'

True

In [39]:
hasNameKey(firstRow.homeworld)

{'name': 'Tatooine', 'rotation_period': '23', 'orbital_period': '304', 'diameter': '10465', 'climate': 'arid', 'gravity': '1 standard', 'terrain': 'desert', 'surface_water': '1', 'population': '200000', 'residents': ['https://swapi.co/api/people/1/', 'https://swapi.co/api/people/2/', 'https://swapi.co/api/people/4/', 'https://swapi.co/api/people/6/', 'https://swapi.co/api/people/7/', 'https://swapi.co/api/people/8/', 'https://swapi.co/api/people/9/', 'https://swapi.co/api/people/11/', 'https://swapi.co/api/people/43/', 'https://swapi.co/api/people/62/'], 'films': ['https://swapi.co/api/films/5/', 'https://swapi.co/api/films/4/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/'], 'created': '2014-12-09T13:50:49.641000Z', 'edited': '2014-12-21T20:48:04.175778Z', 'url': 'https://swapi.co/api/planets/1/'}


True

Now, we have a list of columns that we'd like to run this over (primarily those we've seen above with URIs in them). One thing we haven't done, though, is write a function to determine which columns have a URI and which don't. It's easy for this dataset since it's rather small, but the larger the set, the more tedious doing this by hand is, so let's write another function.

We could try to use the built in pandas series string manipulation functions, however, let's take a look when we try to do that. Let's take a look at the first row again, and see if it has any strings that contain https: 

In [40]:
firstRow.str.contains('https')

birth_year    False
created       False
edited        False
eye_color     False
films           NaN
gender        False
hair_color    False
height        False
homeworld      True
mass          False
name          False
skin_color    False
species         NaN
starships       NaN
url            True
vehicles        NaN
Name: 0, dtype: object

This functionality is great if you know that your data will all be a specific value. Let's get around this by writing a quick function to apply to our series to get a boolean array back. 


In order to iterate over each column, let's try just printing out each column name and it's value: 

In [41]:
def columnHasLink(dataRow): 
    for index, val in dataRow.iteritems():
        print("Index: ", index, "\nValue: ", val, "\n\n")

In [42]:
columnHasLink(firstRow)

Index:  birth_year 
Value:  19BBY 


Index:  created 
Value:  2014-12-09T13:50:51.644000Z 


Index:  edited 
Value:  2014-12-20T21:17:56.891000Z 


Index:  eye_color 
Value:  blue 


Index:  films 
Value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/', 'https://swapi.co/api/films/7/'] 


Index:  gender 
Value:  male 


Index:  hair_color 
Value:  blond 


Index:  height 
Value:  172 


Index:  homeworld 
Value:  https://swapi.co/api/planets/1/ 


Index:  mass 
Value:  77 


Index:  name 
Value:  Luke Skywalker 


Index:  skin_color 
Value:  fair 


Index:  species 
Value:  ['https://swapi.co/api/species/1/'] 


Index:  starships 
Value:  ['https://swapi.co/api/starships/12/', 'https://swapi.co/api/starships/22/'] 


Index:  url 
Value:  https://swapi.co/api/people/1/ 


Index:  vehicles 
Value:  ['https://swapi.co/api/vehicles/14/', 'https://swapi.co/api/vehicles/30/'] 




Now that we know how to do that, let's write our function to determine whether or not each column has URI in it. 

In [43]:
def columnHasLink(dataRow): 
    for index, val in dataRow.iteritems(): 
        if type(val) is list: 
            print(index, "(list)", "https" in val[0])
            continue
        print(index, "https" in val)
        

In [44]:
columnHasLink(firstRow)

birth_year False
created False
edited False
eye_color False
films (list) True
gender False
hair_color False
height False
homeworld True
mass False
name False
skin_color False
species (list) True
starships (list) True
url True
vehicles (list) True


We can see whether or not our column has an 'https' in it, but how do we apply this to our series to get our boolean series back?

Let's take a look at the internal workings of our function. All we really need is the logic to return a boolean array, so, let's rework our function to work for only a single value: 

In [45]:
def booleanApplication(val):
    if type(val) is list: 
            return 'https' in val[0]
    return 'https' in val

Now, this is the first time we'll be using our `apply` method. The `apply` method simply applies a function across an axis of a dataframe. Let's apply our function: 

In [46]:
hasURLSeries = firstRow.apply(booleanApplication)
hasURLSeries

birth_year    False
created       False
edited        False
eye_color     False
films          True
gender        False
hair_color    False
height        False
homeworld      True
mass          False
name          False
skin_color    False
species        True
starships      True
url            True
vehicles       True
Name: 0, dtype: bool

Now, let's use that to see what's in our dataframe: 

In [47]:
firstRow[hasURLSeries]

films        [https://swapi.co/api/films/2/, https://swapi....
homeworld                      https://swapi.co/api/planets/1/
species                      [https://swapi.co/api/species/1/]
starships    [https://swapi.co/api/starships/12/, https://s...
url                             https://swapi.co/api/people/1/
vehicles     [https://swapi.co/api/vehicles/14/, https://sw...
Name: 0, dtype: object

Perfect. Now we've got all of our columns that have URIs in them. Let's use those to find out if each has the `name` attribute in its data so that we can replace those URIs with actual usable data. Let's do what we just did before and apply the function we wrote earlier to our dataframe: 

In [48]:
firstRow[hasURLSeries].map(hasNameKey)

[{'title': 'The Empire Strikes Back', 'episode_id': 5, 'opening_crawl': 'It is a dark time for the\r\nRebellion. Although the Death\r\nStar has been destroyed,\r\nImperial troops have driven the\r\nRebel forces from their hidden\r\nbase and pursued them across\r\nthe galaxy.\r\n\r\nEvading the dreaded Imperial\r\nStarfleet, a group of freedom\r\nfighters led by Luke Skywalker\r\nhas established a new secret\r\nbase on the remote ice world\r\nof Hoth.\r\n\r\nThe evil lord Darth Vader,\r\nobsessed with finding young\r\nSkywalker, has dispatched\r\nthousands of remote probes into\r\nthe far reaches of space....', 'director': 'Irvin Kershner', 'producer': 'Gary Kurtz, Rick McCallum', 'release_date': '1980-05-17', 'characters': ['https://swapi.co/api/people/1/', 'https://swapi.co/api/people/2/', 'https://swapi.co/api/people/3/', 'https://swapi.co/api/people/4/', 'https://swapi.co/api/people/5/', 'https://swapi.co/api/people/10/', 'https://swapi.co/api/people/13/', 'https://swapi.co/api/peop

films        False
homeworld     True
species       True
starships     True
url           True
vehicles      True
Name: 0, dtype: bool

Looks like each of our datapoints has a name execpt for films. Let's take a quick look at films to see what's up: 

In [49]:
getJsonFromRequests(firstRow['films'][0])

{'title': 'The Empire Strikes Back',
 'episode_id': 5,
 'opening_crawl': 'It is a dark time for the\r\nRebellion. Although the Death\r\nStar has been destroyed,\r\nImperial troops have driven the\r\nRebel forces from their hidden\r\nbase and pursued them across\r\nthe galaxy.\r\n\r\nEvading the dreaded Imperial\r\nStarfleet, a group of freedom\r\nfighters led by Luke Skywalker\r\nhas established a new secret\r\nbase on the remote ice world\r\nof Hoth.\r\n\r\nThe evil lord Darth Vader,\r\nobsessed with finding young\r\nSkywalker, has dispatched\r\nthousands of remote probes into\r\nthe far reaches of space....',
 'director': 'Irvin Kershner',
 'producer': 'Gary Kurtz, Rick McCallum',
 'release_date': '1980-05-17',
 'characters': ['https://swapi.co/api/people/1/',
  'https://swapi.co/api/people/2/',
  'https://swapi.co/api/people/3/',
  'https://swapi.co/api/people/4/',
  'https://swapi.co/api/people/5/',
  'https://swapi.co/api/people/10/',
  'https://swapi.co/api/people/13/',
  'https:

Interestingly enough, while every other datapoint in this set has a `name`, the `films` category distinguishes each by `title`. In order to ensure that we're actually getting all of our data, let's make sure that we pull either `name` or `title` out when we write our function. Because I love functions for everything, let's write a single function that will give us either the name OR the title: 

In [50]:
def getNameOrTitle(jsonObject): 
    if jsonObject.get('name') != None: 
        return jsonObject['name']
    if jsonObject.get('title') != None: 
        return jsonObject['title']
    
    return 'XXXXXXXXXXXXX'

Let's test our function: 

In [51]:
getNameOrTitle(getJsonFromRequests(firstRow['homeworld']))

'Tatooine'

In [52]:
getNameOrTitle(getJsonFromRequests(firstRow['films']))

InvalidSchema: No connection adapters were found for '['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/', 'https://swapi.co/api/films/7/']'

One thing we're not accounting for in our above function is whether or not we're passing in a list or a singular string. Let's see if it works by passing in the specific string from the films array: 

In [53]:
getNameOrTitle(getJsonFromRequests(firstRow['films'][0]))

'The Empire Strikes Back'

We could rewrite our function to work with both strings and lists of strings, but it's nice to keep our helper functions simple. Let's do that logic outside of our function in a different function. At this point, we may be ready to work a function into actually returning data for our series: 

In [54]:
def getURLData(value): 
    if type(value) is list: 
        listValues = map(lambda x: getNameOrTitle(getJsonFromRequests(x)), value)
        return list(listValues)
    return getNameOrTitle(getJsonFromRequests(value))

In [55]:
firstRow[hasURLSeries].apply(getURLData)

films        [The Empire Strikes Back, Revenge of the Sith,...
homeworld                                             Tatooine
species                                                [Human]
starships                           [X-wing, Imperial shuttle]
url                                             Luke Skywalker
vehicles                  [Snowspeeder, Imperial Speeder Bike]
Name: 0, dtype: object

GREAT! Now we've got this data. The downside is that, while we've been playing with this dataframe, because nothing's originally in place, we'll have to set these ourselves. There are multiple ways to do this. Self assignment is one of them, however, that's generally speaking, a bad idea. Instead, what we should do is apply a method that checks whether or not the data have a URI (which we just so happened to have already done) and then apply our methods to that! 

In [56]:
def getURLDataForAllValues(val): 
    print("value: ", val)
    print(" Type: ", type(val))
    if type(val) is list and len(val) > 0 and 'https' in val[0]:
        return getURLData(val)
    elif 'https' in val:
        return getURLData(val)
    else: 
        return val


Let's test the function out by running it with the first row: 

In [57]:
allCharacters.iloc[0].apply(getURLDataForAllValues)

value:  19BBY
 Type:  <class 'str'>
value:  2014-12-09T13:50:51.644000Z
 Type:  <class 'str'>
value:  2014-12-20T21:17:56.891000Z
 Type:  <class 'str'>
value:  blue
 Type:  <class 'str'>
value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/', 'https://swapi.co/api/films/7/']
 Type:  <class 'list'>
value:  male
 Type:  <class 'str'>
value:  blond
 Type:  <class 'str'>
value:  172
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  77
 Type:  <class 'str'>
value:  Luke Skywalker
 Type:  <class 'str'>
value:  fair
 Type:  <class 'str'>
value:  ['https://swapi.co/api/species/1/']
 Type:  <class 'list'>
value:  ['https://swapi.co/api/starships/12/', 'https://swapi.co/api/starships/22/']
 Type:  <class 'list'>
value:  https://swapi.co/api/people/1/
 Type:  <class 'str'>
value:  ['https://swapi.co/api/vehicles/14/', 'https://swapi.co/api/vehicles/30/']
 Type:  <class 'l

birth_year                                                19BBY
created                             2014-12-09T13:50:51.644000Z
edited                              2014-12-20T21:17:56.891000Z
eye_color                                                  blue
films         [The Empire Strikes Back, Revenge of the Sith,...
gender                                                     male
hair_color                                                blond
height                                                      172
homeworld                                              Tatooine
mass                                                         77
name                                             Luke Skywalker
skin_color                                                 fair
species                                                 [Human]
starships                            [X-wing, Imperial shuttle]
url                                              Luke Skywalker
vehicles                   [Snowspeeder,

Great! Now that we know it works for each of the columns of the first row, let's try it for each of the columns of our entire dataset! 

One thing to note: because we want to change our actual values within our columns, we can overwrite our data and be ok with it. Previously, that was done by just reassigning the column such as: 

```python
dataFrame['columnName'] = dataFrame['columnName'].apply(someFunction)
```

This is no longer acceptable in the world of python and is deprecated. Instead, you need to use the `.loc` method: 

```python
dataFrame.loc[:, 'columnName'] = dataFrame['columnName'].apply(someFunction)

```

In [58]:
allCharacters.loc[:, 'homeworld'] = allCharacters['homeworld'].apply(getURLDataForAllValues)

value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/8/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/2/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/20/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/1/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/21/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/14/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/22/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/23/
 Type:  <class 'str'>
value:  https://swapi.co/api/planets/24/
 Type:  <class 'str'>
va

In [59]:
allCharacters.head()

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,Tatooine,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,Tatooine,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,Naboo,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,Tatooine,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,Alderaan,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]


We've done it for one column, but now let's see if we can do the above for all the columns with a function!

Instead of writing a function to iterate over every single possible column in our dataFrame, we can just as easily pass in an array of columns to get only the desired columns in the dataframe!

This is new! Previously we've been passing in a singular key for a single column, such as: 

```python
dataFrame['columnName']
```

What we can do instead is pass in a list of columns instead so that we can get multiple columns back! 

```python
dataFrame[['columnName1', 'columnName2', 'columnName3']]
```

How do we get those columns though? Remember earlier when we create a giant boolean series for which columns had URL data? 

In [60]:
hasURLSeries

birth_year    False
created       False
edited        False
eye_color     False
films          True
gender        False
hair_color    False
height        False
homeworld      True
mass          False
name          False
skin_color    False
species        True
starships      True
url            True
vehicles       True
Name: 0, dtype: bool

Let's use this series to generate a list of columns from our first row with the `.index` and `.to_list()` methods. The `.index` grabs the indices of the Series, and the `.to_list()` method returns the values in a list format: 

In [61]:
firstRow[hasURLSeries].index.to_list()

['films', 'homeworld', 'species', 'starships', 'url', 'vehicles']

Let's store these as a variable and write a function! 

In [62]:
listOfColumnsWithURLs = firstRow[hasURLSeries].index.to_list()
originalListOfColumnsWithURLS = list(listOfColumnsWithURLs)
listOfColumnsWithURLs

['films', 'homeworld', 'species', 'starships', 'url', 'vehicles']

In [63]:
def getDataForAllColumns(dataFrame, columnList): 
    for column in dataFrame[columnList].columns: 
        print("Getting data for:", column)
        dataFrame.loc[:, column] = dataFrame[column].apply(getURLDataForAllValues)

In [64]:
allCharacters[listOfColumnsWithURLs]

Unnamed: 0,films,homeworld,species,starships,url,vehicles
0,"[https://swapi.co/api/films/2/, https://swapi....",Tatooine,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,"[https://swapi.co/api/films/2/, https://swapi....",Tatooine,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,"[https://swapi.co/api/films/2/, https://swapi....",Naboo,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,"[https://swapi.co/api/films/2/, https://swapi....",Tatooine,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,"[https://swapi.co/api/films/2/, https://swapi....",Alderaan,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]
5,"[https://swapi.co/api/films/5/, https://swapi....",Tatooine,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/6/,[]
6,"[https://swapi.co/api/films/5/, https://swapi....",Tatooine,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/7/,[]
7,[https://swapi.co/api/films/1/],Tatooine,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/8/,[]
8,[https://swapi.co/api/films/1/],Tatooine,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],https://swapi.co/api/people/9/,[]
9,"[https://swapi.co/api/films/2/, https://swapi....",Stewjon,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",https://swapi.co/api/people/10/,[https://swapi.co/api/vehicles/38/]


Because we've already updated the homeworld column, let's remove `homeworld` from our list. Unlike Series and DataFrames, the list's `.remove` method is inplace by default, so let's save our data in a copy just in case we need it later, and then remove it from the list: 

In [65]:
listOfColumnsWithURLs.remove('homeworld')

print('all values: ', originalListOfColumnsWithURLS)
print('removed homeworld', listOfColumnsWithURLs)

all values:  ['films', 'homeworld', 'species', 'starships', 'url', 'vehicles']
removed homeworld ['films', 'species', 'starships', 'url', 'vehicles']


Making network calls is always an expensive computation as far as time is concerned. Let's write a function to time our computation to see how long it takes. We'll do this by using `datetime`. If you don't have datetime already installed, you can install it by typing into the command line: 

```bash
pip install datetime
```

In [66]:
import datetime 

def gettimeStamp(): 
    currentDT = datetime.datetime.now()
    print (str(currentDT))
    return currentDT

Now that we have our timestamping function, let's call our new function! 

In [67]:
startTime = gettimeStamp()
getDataForAllColumns(allCharacters, listOfColumnsWithURLs)
endTime = gettimeStamp()

difference = endTime - startTime
print(difference)

2019-11-06 23:32:02.564310
Getting data for: films
value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/', 'https://swapi.co/api/films/7/']
 Type:  <class 'list'>
value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/5/', 'https://swapi.co/api/films/4/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/']
 Type:  <class 'list'>
value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/5/', 'https://swapi.co/api/films/4/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/', 'https://swapi.co/api/films/7/']
 Type:  <class 'list'>
value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/api/films/3/', 'https://swapi.co/api/films/1/']
 Type:  <class 'list'>
value:  ['https://swapi.co/api/films/2/', 'https://swapi.co/api/films/6/', 'https://swapi.co/ap

In [68]:
allCharacters.head()

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[The Empire Strikes Back, Revenge of the Sith,...",male,blond,172,Tatooine,77,Luke Skywalker,fair,[Human],"[X-wing, Imperial shuttle]",Luke Skywalker,"[Snowspeeder, Imperial Speeder Bike]"
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[The Empire Strikes Back, Attack of the Clones...",,,167,Tatooine,75,C-3PO,gold,[Droid],[],C-3PO,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[The Empire Strikes Back, Attack of the Clones...",,,96,Naboo,32,R2-D2,"white, blue",[Droid],[],R2-D2,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[The Empire Strikes Back, Revenge of the Sith,...",male,none,202,Tatooine,136,Darth Vader,white,[Human],[TIE Advanced x1],Darth Vader,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[The Empire Strikes Back, Revenge of the Sith,...",female,brown,150,Alderaan,49,Leia Organa,light,[Human],[],Leia Organa,[Imperial Speeder Bike]


Now that we've got the data, let's take a look and se what we've got. First, let's just see the distribution of each column by using Series' built in uniq method: 

In [69]:
for column in allCharacters: 
    print(column)
    uniqValues = allCharacters[column].unique()
    

birth_year
created
edited
eye_color
films


TypeError: unhashable type: 'list'

The built in uniq method works for each column, but errs out when we reach films. That's because we can't get unique variables on our list data. For now, let's skip our lists and just get some uniques on what's not a list: 

In [70]:
for column in allCharacters: 
    print(column)
    if type(allCharacters[column].iloc[0]) is list:
        continue
    uniqValues = allCharacters[column].unique()
    print("Column: ", column, " Values: ", uniqValues)
    

birth_year
Column:  birth_year  Values:  ['19BBY' '112BBY' '33BBY' '41.9BBY' '52BBY' '47BBY' 'unknown' '24BBY'
 '57BBY' '64BBY' '200BBY' '29BBY' '44BBY' '600BBY' '21BBY' '896BBY'
 '82BBY' '31.5BBY' '15BBY' '53BBY' '31BBY' '37BBY' '41BBY' '48BBY' '8BBY'
 '92BBY' '91BBY' '62BBY' '72BBY' '54BBY' '22BBY' '58BBY' '40BBY' '102BBY'
 '67BBY' '66BBY' '46BBY']
created
Column:  created  Values:  ['2014-12-09T13:50:51.644000Z' '2014-12-10T15:10:51.357000Z'
 '2014-12-10T15:11:50.376000Z' '2014-12-10T15:18:20.704000Z'
 '2014-12-10T15:20:09.791000Z' '2014-12-10T15:52:14.024000Z'
 '2014-12-10T15:53:41.121000Z' '2014-12-10T15:57:50.959000Z'
 '2014-12-10T15:59:50.509000Z' '2014-12-10T16:16:29.192000Z'
 '2014-12-10T16:20:44.310000Z' '2014-12-10T16:26:56.138000Z'
 '2014-12-10T16:42:45.066000Z' '2014-12-10T16:49:14.582000Z'
 '2014-12-10T17:03:30.334000Z' '2014-12-10T17:11:31.638000Z'
 '2014-12-12T11:08:06.469000Z' '2014-12-12T11:16:56.569000Z'
 '2014-12-15T12:26:01.042000Z' '2014-12-15T12:48:05.971000Z'
 '

First and foremost, let's take a quick look at the `Created`, `Edited`, and the `URL` columns. These columns here do not seem to offer any significance. Particularly since the URL columns seem to now be duplicating our name column. So let's ditch those columns with the `.drop` method. As a reminder, nothing is inplace by default, but it does always return the values desired, so let's create a new dataframe without the `created`, `edited`, or `url`: 

In [71]:
charactersDF = allCharacters.drop(['created', 'edited', 'url'], axis=1)
charactersDF.head()

Unnamed: 0,birth_year,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,vehicles
0,19BBY,blue,"[The Empire Strikes Back, Revenge of the Sith,...",male,blond,172,Tatooine,77,Luke Skywalker,fair,[Human],"[X-wing, Imperial shuttle]","[Snowspeeder, Imperial Speeder Bike]"
1,112BBY,yellow,"[The Empire Strikes Back, Attack of the Clones...",,,167,Tatooine,75,C-3PO,gold,[Droid],[],[]
2,33BBY,red,"[The Empire Strikes Back, Attack of the Clones...",,,96,Naboo,32,R2-D2,"white, blue",[Droid],[],[]
3,41.9BBY,yellow,"[The Empire Strikes Back, Revenge of the Sith,...",male,none,202,Tatooine,136,Darth Vader,white,[Human],[TIE Advanced x1],[]
4,19BBY,brown,"[The Empire Strikes Back, Revenge of the Sith,...",female,brown,150,Alderaan,49,Leia Organa,light,[Human],[],[Imperial Speeder Bike]


Now, we've take a quick look at the data from the `uniques` function, getting the unique values for each of these values. It looks like many of the values we've gotten back are the exact same. Many of the movies, homeworlds, starships, vehicles, and species all seem to be shared. While our network calls were not too taxing, much larger datasets can be significantly more worrisome. 

Instead of calling the same URL over and over again, what if we instead created our own dictionary of values, having called the URLs only once? First, we'll have to get every unique URL in our dataframe. Maybe we should've kept our original dataframe and not overwritten it? 

In [72]:
originalDF = makeCallandReturnWholeList()

Let's get rid of created, edited, and url early this time: 

In [73]:
originalDF.head()

Unnamed: 0,birth_year,created,edited,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,url,vehicles
0,19BBY,2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...",https://swapi.co/api/people/1/,"[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/2/,[]
2,33BBY,2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],https://swapi.co/api/people/3/,[]
3,41.9BBY,2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],https://swapi.co/api/people/4/,[]
4,19BBY,2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],https://swapi.co/api/people/5/,[https://swapi.co/api/vehicles/30/]


In [74]:
originalDF.drop(['created', 'edited', 'url'], axis=1, inplace=True)

We've now dropped the columns with metadata that we don't care about significantly earlier in the process. Now we won't accidentally make network calls to get `url` data. Let's get a new list of columns, but now with no URLs: 

In [75]:
newListOfColumnsWithURLS = originalListOfColumnsWithURLS
newListOfColumnsWithURLS.remove('url') # because we just dropped it
originalDF[newListOfColumnsWithURLS]

Unnamed: 0,films,homeworld,species,starships,vehicles
0,"[https://swapi.co/api/films/2/, https://swapi....",https://swapi.co/api/planets/1/,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...","[https://swapi.co/api/vehicles/14/, https://sw..."
1,"[https://swapi.co/api/films/2/, https://swapi....",https://swapi.co/api/planets/1/,[https://swapi.co/api/species/2/],[],[]
2,"[https://swapi.co/api/films/2/, https://swapi....",https://swapi.co/api/planets/8/,[https://swapi.co/api/species/2/],[],[]
3,"[https://swapi.co/api/films/2/, https://swapi....",https://swapi.co/api/planets/1/,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],[]
4,"[https://swapi.co/api/films/2/, https://swapi....",https://swapi.co/api/planets/2/,[https://swapi.co/api/species/1/],[],[https://swapi.co/api/vehicles/30/]
5,"[https://swapi.co/api/films/5/, https://swapi....",https://swapi.co/api/planets/1/,[https://swapi.co/api/species/1/],[],[]
6,"[https://swapi.co/api/films/5/, https://swapi....",https://swapi.co/api/planets/1/,[https://swapi.co/api/species/1/],[],[]
7,[https://swapi.co/api/films/1/],https://swapi.co/api/planets/1/,[https://swapi.co/api/species/2/],[],[]
8,[https://swapi.co/api/films/1/],https://swapi.co/api/planets/1/,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/12/],[]
9,"[https://swapi.co/api/films/2/, https://swapi....",https://swapi.co/api/planets/20/,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/48/, https://s...",[https://swapi.co/api/vehicles/38/]


Now that we have all all of our urls back for data that we'd like, let's take a moment and weigh our options for how we'd like to proceed. There are multiple avenues: 

What we could do for the first avenue is to create a basic dictionary with our keys as URLs and our values as their data. 

Our other option is to create a dataframe where we have our URLs aligned with our data. 

Let's do both! But first, we need to get all of our URL data out of the columns. Luckily for us, there's a very useful function to get all of the elements out of a column, and that's the "tolist" method: 

In [76]:
originalDF['films'].tolist()

[['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/',
  'https://swapi.co/api/films/7/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/',
  'https://swapi.co/api/films/7/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/',
  'https://swapi.co/api/films/7/'],
 ['https://swapi.co/api/films/5/',
  'https://swap

The list, now, does give us a bit of a complication, given that our list is now a multidimensional array. Those are never fun to work with, especially when we want to simply get the unique values on the list. This is when we now get to see pydash! [Pydash](https://pydash.readthedocs.io/en/latest/index.html#) is the "kitchen sink of python utility libraries for doing 'stuff'". Pydash is based off of the javascript library lodash. It has many optimized methods that are quick and easy to use.

For our sake, we want to use "flatten", which flattens a multidimensional array by one dimension (which, for us, works perfectly). Additionally, while Series and numpy have uniq functionality, simple lists do not, so we will also be importing the `uniq` package as well. 

To install pydash, type into the command line:

```bash
pip install pydash
```

In [77]:
from pydash import flatten, uniq

In [78]:
flattenedFilmUrls = flatten(originalDF['films'].tolist())
flattenedFilmUrls

['https://swapi.co/api/films/2/',
 'https://swapi.co/api/films/6/',
 'https://swapi.co/api/films/3/',
 'https://swapi.co/api/films/1/',
 'https://swapi.co/api/films/7/',
 'https://swapi.co/api/films/2/',
 'https://swapi.co/api/films/5/',
 'https://swapi.co/api/films/4/',
 'https://swapi.co/api/films/6/',
 'https://swapi.co/api/films/3/',
 'https://swapi.co/api/films/1/',
 'https://swapi.co/api/films/2/',
 'https://swapi.co/api/films/5/',
 'https://swapi.co/api/films/4/',
 'https://swapi.co/api/films/6/',
 'https://swapi.co/api/films/3/',
 'https://swapi.co/api/films/1/',
 'https://swapi.co/api/films/7/',
 'https://swapi.co/api/films/2/',
 'https://swapi.co/api/films/6/',
 'https://swapi.co/api/films/3/',
 'https://swapi.co/api/films/1/',
 'https://swapi.co/api/films/2/',
 'https://swapi.co/api/films/6/',
 'https://swapi.co/api/films/3/',
 'https://swapi.co/api/films/1/',
 'https://swapi.co/api/films/7/',
 'https://swapi.co/api/films/5/',
 'https://swapi.co/api/films/6/',
 'https://swap

In [79]:
uniqueFilmUrls = uniq(flattenedFilmUrls)

print("Length of flattend film list", len(flattenedFilmUrls))
print("Length of uniq film urls", len(uniqueFilmUrls))

Length of flattend film list 173
Length of uniq film urls 7


Already we can see a significant difference. That's ~0.04% of the calls that we were making before, and that's just one column. Let's write a function to get a flattened list for all possible columns: 

In [80]:
def getListOfURLs(dataFrame, listOfColumnsWithURLs):
    urlList = []
    for column in dataFrame[listOfColumnsWithURLs].columns: 
        urlList.append(dataFrame[column].tolist())
    
        
    flattenedList = flatten(urlList)
    print("URL count: ", len(flattenedList))
    uniqueFlattenedList = uniq(flattenedList)
    print("Unique URLs: ", len(uniqueFlattenedList))
    
    return uniqueFlattenedList

In [81]:
uniqueList = getListOfURLs(originalDF, newListOfColumnsWithURLS)

URL count:  435
Unique URLs:  137


The above seems a bit interesting given that our films had 7 unique urls, and now our entire list seems to only have 7 urls too. Let's peer into what our function did:  

In [82]:
uniqueList

[['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/',
  'https://swapi.co/api/films/7/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/',
  'https://swapi.co/api/films/7/'],
 ['https://swapi.co/api/films/2/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/3/',
  'https://swapi.co/api/films/1/'],
 ['https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/6/',
  'https://swapi.co/api/films/1/'],
 ['https://swapi.co/api/films/1/'],
 ['https://swapi.co/api/films/5/',
  'https://swapi.co/api/films/4/',
  'https://swa

We still have a list of lists, what we need to have done is either flattened our list once more, or used the pydash function flatten_deep. Either works. Suppose we didn't want to import any more into our project, let's just use flatten once more. We can flatten when we pull everything out of the column, and then flatten once more after appending to the list at the very end:

In [83]:
def getListOfURLs(dataFrame, listOfColumnsWithURLs):
    urlList = []
    for column in dataFrame[listOfColumnsWithURLs].columns: 
        flattenedSeriesList = flatten(dataFrame[column].tolist())
        urlList.append(flattenedSeriesList)
        
        
    flattenedList = flatten(urlList)
    print("URL count: ", len(flattenedList))
    uniqueFlattenedList = uniq(flattenedList)
    print("Unique URLs: ", len(uniqueFlattenedList))
    
    return uniqueFlattenedList

In [84]:
# listOfColumnsWithURLs.remove('url')
print(newListOfColumnsWithURLS)
uniqueList = getListOfURLs(originalDF, newListOfColumnsWithURLS)

['films', 'homeworld', 'species', 'starships', 'vehicles']
URL count:  386
Unique URLs:  119


Now that we have a complete list of unique URLs let's try our hands at making both: 
- a dataframe with key and values as rows
- a dictionary to have a key value pairing for URL to value 

### DataFrame Method : 

In [85]:
urlSeries = pd.Series(uniqueList)
urlSeries.head()

0    https://swapi.co/api/films/2/
1    https://swapi.co/api/films/6/
2    https://swapi.co/api/films/3/
3    https://swapi.co/api/films/1/
4    https://swapi.co/api/films/7/
dtype: object

In [86]:
valuesSeries = urlSeries.apply(getJsonFromRequests) 
valuesSeries.head()

0    {'title': 'The Empire Strikes Back', 'episode_...
1    {'title': 'Revenge of the Sith', 'episode_id':...
2    {'title': 'Return of the Jedi', 'episode_id': ...
3    {'title': 'A New Hope', 'episode_id': 4, 'open...
4    {'title': 'The Force Awakens', 'episode_id': 7...
dtype: object

In [87]:
dataFrameURLValue = pd.DataFrame(dict(url=urlSeries, values=valuesSeries)).reset_index()
dataFrameURLValue

Unnamed: 0,index,url,values
0,0,https://swapi.co/api/films/2/,"{'title': 'The Empire Strikes Back', 'episode_..."
1,1,https://swapi.co/api/films/6/,"{'title': 'Revenge of the Sith', 'episode_id':..."
2,2,https://swapi.co/api/films/3/,"{'title': 'Return of the Jedi', 'episode_id': ..."
3,3,https://swapi.co/api/films/1/,"{'title': 'A New Hope', 'episode_id': 4, 'open..."
4,4,https://swapi.co/api/films/7/,"{'title': 'The Force Awakens', 'episode_id': 7..."
5,5,https://swapi.co/api/films/5/,"{'title': 'Attack of the Clones', 'episode_id'..."
6,6,https://swapi.co/api/films/4/,"{'title': 'The Phantom Menace', 'episode_id': ..."
7,7,https://swapi.co/api/planets/1/,"{'name': 'Tatooine', 'rotation_period': '23', ..."
8,8,https://swapi.co/api/planets/8/,"{'name': 'Naboo', 'rotation_period': '26', 'or..."
9,9,https://swapi.co/api/planets/2/,"{'name': 'Alderaan', 'rotation_period': '24', ..."


In [88]:
dataFrameURLValue['nameOrTitle'] = dataFrameURLValue['values'].apply(getNameOrTitle)
dataFrameURLValue.head()

Unnamed: 0,index,url,values,nameOrTitle
0,0,https://swapi.co/api/films/2/,"{'title': 'The Empire Strikes Back', 'episode_...",The Empire Strikes Back
1,1,https://swapi.co/api/films/6/,"{'title': 'Revenge of the Sith', 'episode_id':...",Revenge of the Sith
2,2,https://swapi.co/api/films/3/,"{'title': 'Return of the Jedi', 'episode_id': ...",Return of the Jedi
3,3,https://swapi.co/api/films/1/,"{'title': 'A New Hope', 'episode_id': 4, 'open...",A New Hope
4,4,https://swapi.co/api/films/7/,"{'title': 'The Force Awakens', 'episode_id': 7...",The Force Awakens


The values series has the json data obtained from our request. We then get the name or title by creating a brand new column by applying our method `getNameOrTitle` to the `values` column, and assigning it to a new column `nameOrTitle`. 

We did not need to do this, but it is good to see that things such as the above can be done. You might be tempted to write: 

```python
valuesSeries = urlSeries.apply(getNameOrTitle(getJsonFromRequests))
```
The above syntax is, however, incorrect. This is because the implicit function calling doesn't fully comprehend where to enter the function. What we'll need to do instead is use a lambda and show the method where to go. We could just as easily have written: 

In [89]:
valueSeriesInOneStep = urlSeries.apply(lambda x: getNameOrTitle(getJsonFromRequests(x)))
valueSeriesInOneStep.head()

0    The Empire Strikes Back
1        Revenge of the Sith
2         Return of the Jedi
3                 A New Hope
4          The Force Awakens
dtype: object

Now that we have our dataframe with URLs and their associated names, we can use that to obtain all of the values for the URLs in our original dataframe. 

Since we're looking for a singular element, we'll need to use the `url` to determine a boolean series to pass back into the dataframe, such as: 

In [90]:
booleanSeries = dataFrameURLValue['url'] == 'https://swapi.co/api/planets/34/'
dataFrameURLValue[ booleanSeries ]

Unnamed: 0,index,url,values,nameOrTitle
28,28,https://swapi.co/api/planets/34/,"{'name': 'Toydaria', 'rotation_period': '21', ...",Toydaria


Finally, once that's finished, we'll need to obtain our specific value from the `nameOrTitle` column. We can do that by just appending the additional call onto the retrieved row from passing the boolean series into the dataframe: 

In [91]:
dataFrameURLValue[ booleanSeries ]['nameOrTitle']

28    Toydaria
Name: nameOrTitle, dtype: object

Now that we have our steps, let's write a function to obtain all of these objects for each possible cell, though, again, we'll need some method to apply to each column. Let's be a little more creative this time and use a recurive function. In this function, if we have a list, we map over each element of the list, and call our function again. If that element is a nested list it'll do the same, otherwise, we'll actually grab our data from the dataframe and return it:

In [92]:
def findDataFromURLDF(url, urlDataFrame):
    if type(url) is list: 
        urlValues = map(lambda x: findDataFromURLDF(x,urlDataFrame), url)
        return list(urlValues) 
    
    return urlDataFrame[urlDataFrame['url']== url]['nameOrTitle'].values[0]

In [93]:
print("List data: ", findDataFromURLDF(originalDF['films'].iloc[0], dataFrameURLValue)) 
print("Singular Data: ", findDataFromURLDF(originalDF['homeworld'].iloc[0], dataFrameURLValue))


List data:  ['The Empire Strikes Back', 'Revenge of the Sith', 'Return of the Jedi', 'A New Hope', 'The Force Awakens']
Singular Data:  Tatooine


In [94]:
def retrieveDataFromURLDataFrame(dataFrame, columnList, urlDataFrame): 
    for column in dataFrame[columnList].columns:
        dataFrame.loc[:, column] = dataFrame[column].apply(lambda x: findDataFromURLDF(x, urlDataFrame))

Before we apply our new function to all columns within our dataframe, let's not repeat the hard learned lesson and overwrite our original dataframe. We'll copy it by using the DataFrame constructor: 

In [95]:
copiedDF = originalDF.copy()
copiedDF.head()

Unnamed: 0,birth_year,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,vehicles
0,19BBY,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...","[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],[]
2,33BBY,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],[]
3,41.9BBY,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],[]
4,19BBY,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],[https://swapi.co/api/vehicles/30/]


In [96]:
retrieveDataFromURLDataFrame(copiedDF, newListOfColumnsWithURLS, dataFrameURLValue)


In [97]:
copiedDF.head()

Unnamed: 0,birth_year,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,vehicles
0,19BBY,blue,"[The Empire Strikes Back, Revenge of the Sith,...",male,blond,172,Tatooine,77,Luke Skywalker,fair,[Human],"[X-wing, Imperial shuttle]","[Snowspeeder, Imperial Speeder Bike]"
1,112BBY,yellow,"[The Empire Strikes Back, Attack of the Clones...",,,167,Tatooine,75,C-3PO,gold,[Droid],[],[]
2,33BBY,red,"[The Empire Strikes Back, Attack of the Clones...",,,96,Naboo,32,R2-D2,"white, blue",[Droid],[],[]
3,41.9BBY,yellow,"[The Empire Strikes Back, Revenge of the Sith,...",male,none,202,Tatooine,136,Darth Vader,white,[Human],[TIE Advanced x1],[]
4,19BBY,brown,"[The Empire Strikes Back, Revenge of the Sith,...",female,brown,150,Alderaan,49,Leia Organa,light,[Human],[],[Imperial Speeder Bike]


### Creating a dictionary

We can equally do this work with a dictionary. The same applies as above, but this time, instead of doing anything wild with our urlList, we just create the dictionary out of the values pulled from the requets.

In [98]:
def createURLDictionary(urlList): 
    dictionary = {}
    for url in urlList: 
        nameTitle = getNameOrTitle(getJsonFromRequests(url))
        print("Name", nameTitle)
        dictionary[url] = nameTitle
    
    return dictionary

In [99]:
starTime = gettimeStamp()
urlResponseDictionary = createURLDictionary(uniqueList)
endTime = gettimeStamp()

difference = endTime - starTime
print(difference)

2019-11-06 23:42:15.255432
Name The Empire Strikes Back
Name Revenge of the Sith
Name Return of the Jedi
Name A New Hope
Name The Force Awakens
Name Attack of the Clones
Name The Phantom Menace
Name Tatooine
Name Naboo
Name Alderaan
Name Stewjon
Name Eriadu
Name Kashyyyk
Name Corellia
Name Rodia
Name Nal Hutta
Name Bestine IV
Name unknown
Name Kamino
Name Trandosha
Name Socorro
Name Bespin
Name Mon Cala
Name Chandrila
Name Endor
Name Sullust
Name Cato Neimoidia
Name Coruscant
Name Toydaria
Name Malastare
Name Dathomir
Name Ryloth
Name Vulpter
Name Troiken
Name Tund
Name Haruun Kal
Name Cerea
Name Glee Anselm
Name Iridonia
Name Iktotch
Name Quermia
Name Dorin
Name Champala
Name Geonosis
Name Mirial
Name Serenno
Name Concord Dawn
Name Zolan
Name Ojom
Name Aleen Minor
Name Skako
Name Muunilinst
Name Shili
Name Kalee
Name Umbara
Name Utapau
Name Human
Name Droid
Name Wookiee
Name Rodian
Name Hutt
Name Yoda's species
Name Trandoshan
Name Mon Calamari
Name Ewok
Name Sullustan
Name Neimodian


In [100]:
urlResponseDictionary['https://swapi.co/api/planets/8/']

'Naboo'

So, now that we have our dictionary, we no longer need to make unnecessary calls to our api. Let's now rewrite our apply method so that we're replacing data not with a call to our URL, but instead our dictionary: 

In [101]:
def getDictionaryData(value, urlResponseDictionary): 
    if type(value) is list: 
        listValues = map(lambda x: urlResponseDictionary[x], value)
        return list(listValues)
    return urlResponseDictionary[value]

In [102]:
def getDictionaryDataForAllValues(val, urlResponseDictionary): 
    if type(val) is list and len(val) > 0 and 'https' in val[0] or 'https' in val:
        return getDictionaryData(val, urlResponseDictionary)
    return val

In [103]:
def getDataForAllColumnsNotURLCall(dataFrame, columnList, urlResponseDictionary): 
    for column in dataFrame[columnList].columns: 
        print("Getting data for:", column)
        dataFrame.loc[:, column] = dataFrame[column].apply(lambda x: getDictionaryDataForAllValues(x, urlResponseDictionary))
    

In [106]:
originalDFCopyForDictionary = originalDF.copy()
originalDFCopyForDictionary.head()

Unnamed: 0,birth_year,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,vehicles
0,19BBY,blue,"[https://swapi.co/api/films/2/, https://swapi....",male,blond,172,https://swapi.co/api/planets/1/,77,Luke Skywalker,fair,[https://swapi.co/api/species/1/],"[https://swapi.co/api/starships/12/, https://s...","[https://swapi.co/api/vehicles/14/, https://sw..."
1,112BBY,yellow,"[https://swapi.co/api/films/2/, https://swapi....",,,167,https://swapi.co/api/planets/1/,75,C-3PO,gold,[https://swapi.co/api/species/2/],[],[]
2,33BBY,red,"[https://swapi.co/api/films/2/, https://swapi....",,,96,https://swapi.co/api/planets/8/,32,R2-D2,"white, blue",[https://swapi.co/api/species/2/],[],[]
3,41.9BBY,yellow,"[https://swapi.co/api/films/2/, https://swapi....",male,none,202,https://swapi.co/api/planets/1/,136,Darth Vader,white,[https://swapi.co/api/species/1/],[https://swapi.co/api/starships/13/],[]
4,19BBY,brown,"[https://swapi.co/api/films/2/, https://swapi....",female,brown,150,https://swapi.co/api/planets/2/,49,Leia Organa,light,[https://swapi.co/api/species/1/],[],[https://swapi.co/api/vehicles/30/]


In [107]:
starTime = gettimeStamp()
getDataForAllColumnsNotURLCall(originalDFCopyForDictionary, newListOfColumnsWithURLS, urlResponseDictionary)
endTime = gettimeStamp()

difference = endTime - starTime
print(difference)

2019-11-06 23:46:51.320099
Getting data for: films
Getting data for: homeworld
Getting data for: species
Getting data for: starships
Getting data for: vehicles
2019-11-06 23:46:51.334987
0:00:00.014888


In [108]:
originalDFCopyForDictionary.head()

Unnamed: 0,birth_year,eye_color,films,gender,hair_color,height,homeworld,mass,name,skin_color,species,starships,vehicles
0,19BBY,blue,"[The Empire Strikes Back, Revenge of the Sith,...",male,blond,172,Tatooine,77,Luke Skywalker,fair,[Human],"[X-wing, Imperial shuttle]","[Snowspeeder, Imperial Speeder Bike]"
1,112BBY,yellow,"[The Empire Strikes Back, Attack of the Clones...",,,167,Tatooine,75,C-3PO,gold,[Droid],[],[]
2,33BBY,red,"[The Empire Strikes Back, Attack of the Clones...",,,96,Naboo,32,R2-D2,"white, blue",[Droid],[],[]
3,41.9BBY,yellow,"[The Empire Strikes Back, Revenge of the Sith,...",male,none,202,Tatooine,136,Darth Vader,white,[Human],[TIE Advanced x1],[]
4,19BBY,brown,"[The Empire Strikes Back, Revenge of the Sith,...",female,brown,150,Alderaan,49,Leia Organa,light,[Human],[],[Imperial Speeder Bike]


We've now fully filled out our dataframe (and explored a number of ways to do it)! Cleaning and sorting through data can be time consuming and not a good timewhatsoever, so when we come back to this we definitely don't want o rerun all of our work (especially with those costly network calls). 

In order to save our work, we'll want to actually save the dataframe itself. There are a number of formats in which to save our dataframes, but the most common (and easily accessible) is the csv. CSV stands for "Comma Separated Value", but don't let the name fool you, CSVs are just files that are delimited in some way. If we were to try to delimit our dataframe by using commas, we'd end up having quite the problem, especially given that our films, species, starships, and vehicles are all arrays that are also comma separated. Because of that, we will pipe delimit our csv: 

In [109]:
originalDFCopyForDictionary.to_csv('starWarsCharacters.csv', sep='|')

Because we chose to just output a name, our file is placed into the same directory as this ipython notebook. If we wished, we could use anywhere on the machine we're working. 

One final note, because we delimited our dataframe with pipes, that means when we import it in the future, we'll have to ensure that we declare we're pipe delimiting. 