# EXTRACTION
## Data Source
- Use Events API: https://api.predicthq.com/v1/events?
## Methodology
### Data Analysis - View Data
1. Convert to json format and read json. Determine which variables are useful and required for the deliverable.

FINDINGS: The API allows user to call for 50 entries/page, and up to 100 pages. i.e. Each time a call is made, there are only 50 entry results. 
### Data Analysis - Deetermine and Extract Data Required

- Number of entries required: 5000. 
    1. Loop and set offset at every 50 interval (0, 50, 100, 150 etc until it reaches 4950. 5000 results max from API.)
- Variables required: id, country, category, title, start_date, end_date, country, location
    2. Use pandas to filter for the variables required. 
        - *location is the coordinates

# TRANSFORMATION
- Concatonate the dataframes together (from all 100 pages) and save the final dataframe to a variable.

# LOAD
- Load data into PostSQL

-------------------------------

# EXTRACTION

In [1]:
# Dependencies
import pandas
import requests
import pprint
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize 
from sqlalchemy import create_engine

# Credential File: py_config.py containing variable ACCESS_TOKEN = "xxxxxxxxx"
import py_config

## Data Analysis - View Data
Convert to json format and read json. Determine which variables are useful and required for the deliverable.

In [2]:
# Connect to API url and get data
# Variable ACCESS_TOKEN is referenced in py_config.py file and is in list .gitignore
response = requests.get(
    url ="https://api.predicthq.com/v1/events?",
    headers={
      "Authorization": f"Bearer {py_config.ACCESS_TOKEN}",
      "Accept": "application/json"
    },
    # params={
    #     "limit": 50,
    params={
      "limit":10,
      "country": "AU",
      "start": "2021-01-01",
      "end": "2022-12-31"
    # }
    }
)

In [3]:
# Convert data to json format (all data)
# Save to variable "data1"
data = response.json()

# Print json (formatted) and analyse which variables to use for deliverable
print(json.dumps(data, indent=4, sort_keys=True))

{
    "count": 5000,
    "next": "https://api.predicthq.com/v1/events/?country=AU&end=2022-12-31&limit=10&offset=10&start=2021-01-01",
    "overflow": true,
    "previous": null,
    "results": [
        {
            "aviation_rank": null,
            "brand_safe": true,
            "category": "community",
            "country": "AU",
            "description": "The Busy Peacock is a place to come and let the children experiment and create. Be it mud, paint, glue, goo, water play, sensory rice play or otherwise, all those things you really want your child to experience but just not in your own home.\n\n45 minute play session running Tuesday \u2013 Sunday, check the calendar for session times and availability.",
            "duration": 2700,
            "end": "2021-07-21T00:15:00Z",
            "entities": [
                {
                    "category": "community",
                    "entity_id": "Dn7bwAXxAEg29PzwifNf5V",
                    "labels": [
                        

In [4]:
data['results'][0]

{'relevance': 1.0,
 'id': 'UZGsZMeY5YegR7MMdD',
 'title': 'Messy Play Sessions',
 'description': 'The Busy Peacock is a place to come and let the children experiment and create. Be it mud, paint, glue, goo, water play, sensory rice play or otherwise, all those things you really want your child to experience but just not in your own home.\n\n45 minute play session running Tuesday – Sunday, check the calendar for session times and availability.',
 'category': 'community',
 'labels': ['community'],
 'rank': 0,
 'local_rank': 0,
 'aviation_rank': None,
 'phq_attendance': None,
 'entities': [{'entity_id': 'Dn7bwAXxAEg29PzwifNf5V',
   'name': 'Messy Play Sessions',
   'type': 'event-group',
   'category': 'community',
   'labels': ['community', 'event-group', 'recurring']}],
 'duration': 2700,
 'start': '2021-07-20T23:30:00Z',
 'end': '2021-07-21T00:15:00Z',
 'updated': '2020-11-18T06:48:00Z',
 'first_seen': '2020-11-18T06:35:06Z',
 'timezone': 'Australia/Melbourne',
 'location': [145.18578,

In [5]:
# Extract data within dictionary key "results"
# Save to variable "data1"
data1 = data['results']
print(json.dumps(data1, indent=4, sort_keys=True))
print('-----------------')

[
    {
        "aviation_rank": null,
        "brand_safe": true,
        "category": "community",
        "country": "AU",
        "description": "The Busy Peacock is a place to come and let the children experiment and create. Be it mud, paint, glue, goo, water play, sensory rice play or otherwise, all those things you really want your child to experience but just not in your own home.\n\n45 minute play session running Tuesday \u2013 Sunday, check the calendar for session times and availability.",
        "duration": 2700,
        "end": "2021-07-21T00:15:00Z",
        "entities": [
            {
                "category": "community",
                "entity_id": "Dn7bwAXxAEg29PzwifNf5V",
                "labels": [
                    "community",
                    "event-group",
                    "recurring"
                ],
                "name": "Messy Play Sessions",
                "type": "event-group"
            }
        ],
        "first_seen": "2020-11-18T06:35:0

In [28]:
address_df.entities.head()[0]

[{'name': 'Taree Zone Field',
  'entity_id': '35xJ2iHewafn5WFK7qnt2Sh',
  'formatted_address': '21 Manning River Drive\nTaree NSW 2430\nAustralia',
  'type': 'venue'}]

In [11]:
address_df = pd.json_normalize(data['results'])
address_df.head(20)

def getEntitiesName(entities):
    try:
        return entities[0]['name']
    except:
        return 'no name'
address_df['name'] = address_df.entities.apply(getEntitiesName)

def getEntitiesAddress(entities):
    try:
        return entities[0]['formatted_address']
    except:
        return 'no address'
address_df['formatted_address'] = address_df.entities.apply(getEntitiesAddress)

address_df.head(3)

def getEntitiesVenue(entities):
    # print(entities)
    if len(entities)>1:
        print(entities)
    try:
        return entities[1]['name']
    except IndexError:
        return 'no venue'
address_df['venue_name'] = address_df.entities.apply(getEntitiesVenue)

address_df

[{'entity_id': 'G3EQ2FUNKgmcdjgtCy7YkX', 'name': 'Future of Financial Services', 'type': 'event-group', 'formatted_address': '8 Whiteman St Southbank VIC 3006', 'category': 'conferences', 'labels': ['conference', 'event-group']}, {'name': 'The Pub at Crown', 'entity_id': 'nSefEmFFYeKvx6YFVUThjj', 'formatted_address': '8 Whiteman St\nSouthbank VIC 3006\nAustralia', 'type': 'venue'}]
[{'entity_id': 'aTApFCCi542T6J2Eu43jGX', 'name': 'The Trevor Kennedy Collection: Highlights', 'type': 'event-group', 'category': 'expos', 'labels': ['event-group', 'expo', 'recurring']}, {'name': 'National Museum of Australia', 'entity_id': '3B3FqxeXvKHJTL7hN4wS3UL', 'formatted_address': 'Lawson Crescent Acton Peninsula\nCanberra ACT 2600\nAustralia', 'type': 'venue'}]
[{'entity_id': 'DwUuivF6i5yYfcSn9hRKBg', 'name': 'Truth, Power and a Free Press', 'type': 'event-group', 'category': 'expos', 'labels': ['event-group', 'expo', 'recurring']}, {'name': 'Museum of Australian Democracy', 'entity_id': 'nSfpQNnykcD

Unnamed: 0,relevance,id,title,description,category,labels,rank,local_rank,aviation_rank,phq_attendance,...,location,scope,country,place_hierarchies,state,brand_safe,private,name,formatted_address,venue_name
0,1.0,UZGsZMeY5YegR7MMdD,Messy Play Sessions,The Busy Peacock is a place to come and let th...,community,[community],0,0,,,...,"[145.18578, -38.261048]",locality,AU,"[[6295630, 6255151, 2077456, 2145234, 7839813,...",active,True,False,Messy Play Sessions,no address,no venue
1,1.0,eQN8f47RzTXvibpxY7,INVOLVING FAMILIES AND OTHERS IN THE CARE OF Y...,This workshop will explore the benefits and ba...,conferences,"[conference, health]",0,0,,,...,"[144.947904, -37.780223]",locality,AU,"[[6295630, 6255151, 2077456, 2145234, 7839805,...",active,True,False,no name,no address,no venue
2,1.0,57NrcFdTDNZcHGZ5Jw,Future of Financial Services,,conferences,[conference],46,56,0.0,600.0,...,"[144.959213, -37.82341]",locality,AU,"[[6295630, 6255151, 2077456, 2145234, 7839805,...",active,True,False,Future of Financial Services,8 Whiteman St Southbank VIC 3006,The Pub at Crown
3,1.0,5gt6YrELDVgyH9H6N4,The Trevor Kennedy Collection: Highlights,Discover objects of rare beauty and items of c...,expos,[expo],45,70,,562.0,...,"[149.119532, -35.292481]",locality,AU,"[[6295630, 6255151, 2077456, 2177478, 2172517]]",active,True,False,The Trevor Kennedy Collection: Highlights,no address,National Museum of Australia
4,1.0,65grzvvvAQcmZH7rim,Gunnedah Saleyards,Experience one of the largest stock selling ce...,expos,[expo],45,73,,562.0,...,"[150.224511, -30.958274]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 7839725,...",active,True,False,Gunnedah Saleyards,no address,no venue
5,1.0,7yXLeY7ZcWoLx8YQU9,PlayUP: The Right to Have an Opinion and Be Heard,PlayUP is the Museum of Australian Democracy's...,expos,[expo],45,70,,562.0,...,"[149.129768, -35.301112]",locality,AU,"[[6295630, 6255151, 2077456, 2177478, 2172517]]",active,True,False,PlayUP: The Right to Have an Opinion and Be Heard,no address,no venue
6,1.0,9JFGB6yX8SjKcm3BZm,Indigenous Stock Workers and Rodeo Riders Disp...,An informative and visual display in recogniti...,expos,"[community, expo]",45,75,,562.0,...,"[141.081268, -17.668348]",locality,AU,"[[6295630, 6255151, 2077456, 2152274, 7839568,...",active,True,False,Indigenous Stock Workers and Rodeo Riders Disp...,no address,no venue
7,1.0,Aii4mGwYmU6GLoCY97,Fitness Industry Technology Summit,,conferences,"[conference, health, technology]",40,47,0.0,300.0,...,"[151.202228, -33.871976]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 6619279,...",active,True,False,Dockside,Wheat Road\nSydney NSW 2000\nAustralia,no venue
8,1.0,EJqNn748zA5y6wMUQq,The Polished Opal,The workshop starts with a short talk about op...,community,[community],0,0,,,...,"[150.333328, -33.702448]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 2175228,...",active,True,False,The Polished Opal,no address,no venue
9,1.0,EuRTzyaAcG7KyFV5E3,"Truth, Power and a Free Press","Truth, Power and a Free press is a compelling ...",expos,[expo],45,71,,562.0,...,"[149.129959, -35.30196]",locality,AU,"[[6295630, 6255151, 2077456, 2177478, 2172517]]",active,True,False,"Truth, Power and a Free Press",no address,Museum of Australian Democracy


In [124]:
address_df.entities.head()

0    [{'name': 'Taree Zone Field', 'entity_id': '35...
1    [{'entity_id': 'R3k2PETx6g7EcbUESWT6yK', 'name...
2    [{'name': 'Churchill Reserve', 'entity_id': 'K...
3    [{'name': 'Pluim Park', 'entity_id': '34spMLXP...
4    [{'name': 'Crestwood Oval', 'entity_id': 'SKz8...
Name: entities, dtype: object

In [53]:
address_df = pd.json_normalize(data1,['entities'], errors='ignore')
address_df.head(20)


Unnamed: 0,name,entity_id,formatted_address,type,category,labels
0,Taree Zone Field,35xJ2iHewafn5WFK7qnt2Sh,21 Manning River Drive\nTaree NSW 2430\nAustralia,venue,,
1,Entertainment Sunday's at Exchange Hotel Gawler,R3k2PETx6g7EcbUESWT6yK,,event-group,performing-arts,"[event-group, performing-arts, recurring]"
2,Churchill Reserve,KsxdmsHkyeYuNdduzxYMtH,Saint Albans VIC 3021\nAustralia,venue,,
3,Pluim Park,34spMLXP39JDr5jaELLDaTH,Tuggerah Street\nLisarow NSW 2250\nAustralia,venue,,
4,Crestwood Oval,SKz88y3kU3e7D64TXakAfB,"Peel Rd, Baulkham Hills, New South Wales\nNew ...",venue,,
5,Hawker Football Centre,ZXEZVcSqUzcFFhM3ucnr7w,Walhallow Street\nHawker ACT 2614\nAustralia,venue,,
6,Mascot Oval,34kvcUUN95XaDbeRA5QbSFA,Sydney\nAustralia,venue,,
7,Kambah 2,qgmmxuwCCi4EiygNhHGq29,"Laidlaw Place, Kambah, Tuggeranong\nCanberra\n...",venue,,
8,Nijong Oval,qpbFJkbre7AXHan9CQLbdf,"Barrack Street, Cooma\nCanberra\nAustralia",venue,,
9,Sydney University Football Ground,iF4LF7yVRW78AdSpr3cYyz,"Regimental Drive, Camperdown\nSydney\nAustralia",venue,,


In [66]:
# View json data in dataframe format using pandas. (Note: variable "data" includes all data)
# variable "events_df" dataframe from dictionary data > results 
events_df = pd.json_normalize(data, record_path = ['results'], meta=['count'], errors='ignore')
events_df.head(20)

Unnamed: 0,relevance,id,title,description,category,labels,rank,local_rank,aviation_rank,phq_attendance,...,first_seen,timezone,location,scope,country,place_hierarchies,state,brand_safe,private,count
0,1.0,nXLhaY4VxcRzeYryGo,Northern NSW Women's Premier League - Mid Coas...,,sports,"[soccer, sport]",40,69,0.0,302.0,...,2021-02-05T00:55:49Z,Australia/Sydney,"[152.493976, -31.896234]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 7839365,...",active,True,False,5000
1,1.0,DvUzX5WEvAbgqsUBYP,Entertainment Sunday's at Exchange Hotel Gawler,Sundays Live is back again - DJs playing track...,performing-arts,[performing-arts],45,73,,562.0,...,2021-01-18T06:35:26Z,Australia/Adelaide,"[138.749307, -34.596536]",locality,AU,"[[6295630, 6255151, 2077456, 2061327, 7839422,...",active,True,False,5000
2,1.0,2uBUSSa46e4B7GSMxn,Victoria NPL - St. Albans Saints vs Dandenong ...,,sports,"[soccer, sport]",41,60,0.0,367.0,...,2021-01-01T08:41:11Z,Australia/Melbourne,"[144.814351, -37.733046]",locality,AU,"[[6295630, 6255151, 2077456, 2145234, 7839533,...",active,True,False,5000
3,1.0,3E2aZxFTZcBddMBDt7,New South Wales NPL 2 - Central Coast II vs Ha...,,sports,"[soccer, sport]",46,74,0.0,605.0,...,2021-01-01T10:23:21Z,Australia/Sydney,"[151.376692, -33.379516]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 2164848,...",active,True,False,5000
4,1.0,3FNA2mAKQG6jvw5iaM,Ron Massey Cup - Hills District Bulls vs St Ma...,,sports,"[rugby, sport]",45,65,,562.0,...,2021-03-10T22:03:24Z,Australia/Sydney,"[150.972036, -33.749092]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 7839690,...",active,True,False,5000
5,1.0,3f7TFQrGCbv58VjXQs,Capital Territory NPL Women - Canberra United ...,,sports,"[soccer, sport]",46,68,0.0,630.0,...,2021-01-01T12:54:38Z,Australia/Sydney,"[149.037341, -35.24206]",locality,AU,"[[6295630, 6255151, 2077456, 2177478, 2172517]...",active,True,False,5000
6,1.0,3oLAaXKL4mMFYwPoRy,Ron Massey Cup - Kaiviti Silktails vs Blacktow...,,sports,"[rugby, sport]",45,58,,562.0,...,2021-03-10T22:03:28Z,Australia/Sydney,"[151.190408, -33.925581]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 6619279,...",active,True,False,5000
7,1.0,4J6CkhjLEgtDkuADGC,Capital Territory NPL - Tuggeranong United vs ...,,sports,"[soccer, sport]",41,65,0.0,339.0,...,2021-01-01T12:26:03Z,Australia/Sydney,"[149.063047, -35.388245]",locality,AU,"[[6295630, 6255151, 2077456, 2177478, 2143964]...",active,True,False,5000
8,1.0,6DZt449GvipSwcE3tZ,Capital Territory NPL - Cooma Tigers vs West C...,,sports,"[soccer, sport]",40,69,0.0,302.0,...,2021-01-01T12:26:03Z,Australia/Sydney,"[149.120377, -36.231124]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 7839716,...",active,True,False,5000
9,1.0,AuSNWZ4pv3EHgPU5YZ,New South Wales NPL Women - Sydney University ...,,sports,"[soccer, sport]",41,52,0.0,362.0,...,2021-01-01T10:25:39Z,Australia/Sydney,"[151.184874, -33.8865]",locality,AU,"[[6295630, 6255151, 2077456, 2155400, 6619279,...",active,True,False,5000


In [7]:
# events_df.join(f_address_df)
events_final_df = pd.merge(events_df, address_df, left_on="title", right_on="name")
events_final_df[["title","name"]].head(10)

Unnamed: 0,title,name
0,Messy Play Sessions,Messy Play Sessions
1,Future of Financial Services,Future of Financial Services
2,The Trevor Kennedy Collection: Highlights,The Trevor Kennedy Collection: Highlights
3,Gunnedah Saleyards,Gunnedah Saleyards
4,PlayUP: The Right to Have an Opinion and Be Heard,PlayUP: The Right to Have an Opinion and Be Heard
5,Indigenous Stock Workers and Rodeo Riders Disp...,Indigenous Stock Workers and Rodeo Riders Disp...
6,The Polished Opal,The Polished Opal
7,"Truth, Power and a Free Press","Truth, Power and a Free Press"


In [8]:
# View all columns (in dictionary: events > results). Determine required columns for deliverable.
events_final_df.columns

Index(['relevance', 'id', 'title', 'description', 'category_x', 'labels_x',
       'rank', 'local_rank', 'aviation_rank', 'phq_attendance', 'entities',
       'duration', 'start', 'end', 'updated', 'first_seen', 'timezone',
       'location', 'scope', 'country', 'place_hierarchies', 'state',
       'brand_safe', 'private', 'entity_id', 'name', 'type', 'category_y',
       'labels_y', 'formatted_address'],
      dtype='object')

## Data Analysis - Determine and Extract Data Required
- Number of entries required: 5000. 
    1. Loop and set offset at every 50 interval (0, 50, 100, 150 etc until it reaches 4950. 5000 results max from API.)
- Variables required: id, country, category, title, start_date, end_date, country, location
    2. Use pandas to filter for the variables required. 
        - *location is the coordinates

In [9]:
# Test for loop 
for i in range (0,500,50):
    print(i)

0
50
100
150
200
250
300
350
400
450


In [7]:
# Create variable "entries" to store the list of dictionaries; 
# Each loop will contain a dictionary (as each pandas dataframe is a dictionaries). 
events_entries=[]

# Do a 'for loop' which loops from index 0 to 5000, at muliples of 50 (as API page can only call 50 entries max each time)
for i in range(0,1500,50):

    response = requests.get(
        url=f"https://api.predicthq.com/v1/events?offset={i}&limit=50",
        headers={
        "Authorization": f"Bearer {py_config.ACCESS_TOKEN}",
        "Accept": "application/json"
        },
        params={
            "country": "AU",
            "start": "2021-01-01",
            "end": "2022-12-31"
        }
    )

    # Save response to variable "data" and "data1"
    data = response.json()
    # data1 = data['results']
    # Save to variable "events_df" the dictionary (data> results) 
    events_df = pd.json_normalize(data, ['results'], errors='ignore')
    # print(events_df.head(3))

    def getEntitiesName(entities):
        try:
            return entities[0]['name']
        except:
            return 'no name'
    
    events_df['name'] = events_df.entities.apply(getEntitiesName)



    def getEntitiesAddress(entities):
        try:
            return entities[0]['formatted_address']
        except:
            return 'no name'
    events_df['formatted_address'] = events_df.entities.apply(getEntitiesAddress)


    def getEntitiesVenue(entities):
    # print(entities)
    if len(entities)>1:
        print(entities)
    try:
        return entities[1]['name']
    except IndexError:
        return 'no venue'
    events_df['venue_name'] = events_df.entities.apply(getEntitiesVenue)

    events_df


    # Extract out only required variables (column headings)
    events_df = events_df[["id","title","description","category","start","end","country","location","rank","name","formatted_address"]]
        
    events_entries.append(events_df)


    # TEST
    # print(events_entries)

# TRANSFORMATION
- Concatenate the dataframes together (from all 100 pages) and save the final dataframe to a variable.

In [8]:
# Concatenate all the dictionaries within list "events_entries". 
# i.e. Convert list to contain the data in the 1 dictionary.
# Save into variable "events_entries_df"
events_entries_df = pd.concat(events_entries)
events_entries_df

Unnamed: 0,id,title,description,category,start,end,country,location,rank,name,formatted_address
0,UZGsZMeY5YegR7MMdD,Messy Play Sessions,The Busy Peacock is a place to come and let th...,community,2021-07-20T23:30:00Z,2021-07-21T00:15:00Z,AU,"[145.18578, -38.261048]",0,Messy Play Sessions,no name
1,eQN8f47RzTXvibpxY7,INVOLVING FAMILIES AND OTHERS IN THE CARE OF Y...,This workshop will explore the benefits and ba...,conferences,2021-07-20T23:30:00Z,2021-07-21T06:30:00Z,AU,"[144.947904, -37.780223]",0,no name,no name
2,57NrcFdTDNZcHGZ5Jw,Future of Financial Services,,conferences,2021-07-20T23:00:00Z,2021-07-21T08:00:00Z,AU,"[144.959213, -37.82341]",46,Future of Financial Services,8 Whiteman St Southbank VIC 3006
3,5gt6YrELDVgyH9H6N4,The Trevor Kennedy Collection: Highlights,Discover objects of rare beauty and items of c...,expos,2021-07-20T23:00:00Z,2021-07-21T07:00:00Z,AU,"[149.119532, -35.292481]",45,The Trevor Kennedy Collection: Highlights,no name
4,65grzvvvAQcmZH7rim,Gunnedah Saleyards,Experience one of the largest stock selling ce...,expos,2021-07-20T23:00:00Z,2021-07-21T01:00:00Z,AU,"[150.224511, -30.958274]",45,Gunnedah Saleyards,no name
...,...,...,...,...,...,...,...,...,...,...,...
45,h8Tgn52FZ9ehPwsngi,Queensland NPL 2 Youth - Sunshine Coast U23 vs...,,sports,2021-07-11T04:00:00Z,2021-07-11T04:00:00Z,AU,"[153.119022, -26.731251]",40,Kawana SC Field 1,Milieu Place\nKawana Waters\nAustralia
46,kWzqFJbmKmN2tU82rQ,Private photography class in Sydney,"Develop your composition and shooting skills, ...",community,2021-07-11T04:00:00Z,2021-07-11T07:00:00Z,AU,"[151.208565, -33.858768]",0,no name,no name
47,7URTbV4WsM39x4UTsC,Queensland NPL Youth League - Brisbane Roar U2...,,sports,2021-07-11T03:45:00Z,2021-07-11T03:45:00Z,AU,"[153.262453, -27.532772]",40,Arthur & Allan Morris Field (Cleveland Showgro...,"60 - 76 Waterloo Street, Cleveland\nBrisbane\n..."
48,LauEV6prCSrCp5hjbr,Queensland NPL 2 Youth - Souths United U23 vs ...,,sports,2021-07-11T03:45:00Z,2021-07-11T03:45:00Z,AU,"[153.069671, -27.590388]",46,Wakerley Park,23 Dew St.\nRuncorn QLD 4113\nAustralia


In [15]:
events_entries_df.columns

Index(['id', 'title', 'description', 'category', 'start', 'end', 'country',
       'location', 'rank', 'name', 'formatted_address'],
      dtype='object')

In [10]:
# Analyse the category types and entries within each category.
events_entries_df['category'].value_counts()

community          476
expos              439
sports             300
performing-arts    181
festivals           41
conferences         35
concerts            23
school-holidays      3
observances          2
Name: category, dtype: int64

# LOAD

In [13]:
# Connect to local database
rds_connection_string = "postgres:postgres@localhost:5432/events_db"
engine = create_engine(f'postgresql://{rds_connection_string}')

In [14]:
# Create a events_db. Add the tables into postgres database as per schema.sql
engine.table_names()

OperationalError: (psycopg2.OperationalError) FATAL:  database "events_db" does not exist

(Background on this error at: http://sqlalche.me/e/13/e3q8)