# EXTRACTION
## Data Source
- Use Events API: https://api.predicthq.com/v1/events?
## Methodology
### Data Analysis - View Data
1. Convert to json format and read json. Determine which variables are useful and required for the deliverable.

FINDINGS: The API allows user to call for 50 entries/page, and up to 100 pages. i.e. Each time a call is made, there are only 50 entry results. 
### Data Analysis - Deetermine and Extract Data Required

- Number of entries required: 5000. 
    1. Loop and set offset at every 50 interval (0, 50, 100, 150 etc until it reaches 4950. 5000 results max from API.)
- Variables required: id, country, category, title, start_date, end_date, country, location
    2. Use pandas to filter for the variables required. 
        - *location is the coordinates

# TRANSFORMATION
- Concatenate the dataframes together (from all 100 pages) and save the final dataframe to a variable.

# LOAD
- Load data into PostSQL

-------------------------------

# EXTRACTION

In [1]:
# Dependencies
import requests
import pprint
import json
import requests
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize 
from sqlalchemy import create_engine

# Credential File: py_config.py containing variable ACCESS_TOKEN = "xxxxxxxxx"
import py_config

## Data Analysis - View Data
Convert to json format and read json. Determine which variables are useful and required for the deliverable.

In [2]:
# Connect to API url and get data
# Variable ACCESS_TOKEN is referenced in py_config.py file and is in list .gitignore
response = requests.get(
    url ="https://api.predicthq.com/v1/events?",
    headers={
      "Authorization": f"Bearer {py_config.ACCESS_TOKEN}",
      "Accept": "application/json"
    },
    # params={
    #     "limit": 50,
    params={
      "limit":10,
      "country": "AU",
      "start": "2021-01-01",
      "end": "2022-12-31"
    # }
    }
)

In [3]:
# Convert data to json format (all data)
# Save to variable "data1"
data = response.json()

# Print json (formatted) and analyse which variables to use for deliverable
print(json.dumps(data, indent=4, sort_keys=True))

he people who grow or make the produce.\n\nEach market features a breakfast bar, providing delicious food for Market customers. This breakfast bar is always run by a local charitable, school or sporting organisation to raise funds for their cause. The market also features local musicians and occasional band performances.\n\nThe Farmers\u2019 Market is certain to become an even more integral community institution over the next ten years, and is already a must-do experience for visitors.\n\nThe Margaret River Farmers' Market is run by the Margaret River Regional Producers' Association, a not for profit association run by a volunteer board.\r\n\r\n\n\nOpen Times: 8.00am to 12.00pm each Saturday.",
            "duration": 14400,
            "end": "2021-07-24T03:30:00Z",
            "entities": [
                {
                    "category": "community",
                    "entity_id": "37JrqKCC6ThJ2QkAjEb9jpD",
                    "labels": [
                        "community",
    

In [4]:
data['results'][0]

{'relevance': 1.0,
 'id': 'TpCYmYhbA4RmmGjZio',
 'title': 'Mount Claremont Farmers Market',
 'description': "The Mount Claremont Farmers' Market is an exciting initiative. Predominantly a fresh produce market operating every Saturday morning at the Mount Claremont Primary School, the market is a school/community-based project that provides an opportunity for farmers, food producers and chefs to sell farm-origin and associated gourmet food products directly to their customers. All proceeds from the Market go to supporting the children of the school, as well as various community projects.\n\nPatrons go mostly for the fresh produce but also to grab coffee, breakfast or brunch. Families take advantage of the school playgrounds.\n\nThe vibe is relaxed yet energised and interactive, the sense of organised turmoil adds informality and humour. Customers are buying more than just an evening meal!\n\nThe air at the markets is always filled with tempting aromas from hot food vendors. Enjoy espres

## Data Analysis - Determine and Extract Data Required
- Number of entries required: 5000. 
    1. Loop and set offset at every 50 interval (0, 50, 100, 150 etc until it reaches 4950. 5000 results max from API.)
- Variables required: id, country, category, title, start_date, end_date, country, location
    2. Use pandas to filter for the variables required. 
        - *location is the coordinates

In [5]:
# Test for loop 
for i in range (0,500,50):
    print(i)

0
50
100
150
200
250
300
350
400
450


In [6]:
# Create variable "entries" to store the list of dictionaries; 
# Each loop will contain a dictionary (as each pandas dataframe is a dictionaries). 
events_entries=[]

# Do a 'for loop' which loops from index 0 to 5000, at muliples of 50 (as API page can only call 50 entries max each time)
for i in range(0,1500,50):

    response = requests.get(
        url=f"https://api.predicthq.com/v1/events?offset={i}&limit=50",
        headers={
        "Authorization": f"Bearer {py_config.ACCESS_TOKEN}",
        "Accept": "application/json"
        },
        params={
            "country": "AU",
            "start": "2021-01-01",
            "end": "2022-12-31"
        }
    )

    # Save response to variable "data" and "data1"
    data = response.json()
    # data1 = data['results']
    # Save to variable "events_df" the dictionary (data> results) 
    events_df = pd.json_normalize(data, ['results'], errors='ignore')
    # print(events_df.head(3))

    def getEntitiesName(entities):
        try:
            return entities[0]['name']
        except:
            return 'no name'
    
    events_df['name'] = events_df.entities.apply(getEntitiesName)



    def getEntitiesAddress(entities):
        try:
            return entities[0]['formatted_address']
        except:
            return 'no address'
    events_df['formatted_address'] = events_df.entities.apply(getEntitiesAddress)


    def getEntitiesVenue(entities):
        try:
            return entities[1]['name']
        except IndexError:
            return 'no venue'
    events_df['venue_name'] = events_df.entities.apply(getEntitiesVenue)

    # Extract out only required variables (column headings)
    events_df = events_df[["id","title","description","category","start","end","country","location","rank","name","venue_name","formatted_address"]]
        
    events_entries.append(events_df)


    # TEST
    # print(events_entries)

# TRANSFORMATION
- Concatenate the dataframes together (from all 100 pages) and save the final dataframe to a variable.

In [7]:
# Concatenate all the dictionaries within list "events_entries". 
# i.e. Convert list to contain the data in the 1 dictionary.
# Save into variable "events_entries_df"
events_entries_df = pd.concat(events_entries)

# Rename columns
events_entries_df = events_entries_df.rename(columns={'start': 'start_date','end':'end_date','location':'coords','name':'title2'})
# events_entries_df

# Drop column title2
events_entries_df = events_entries_df.drop(columns={'title2'})
events_entries_df

Unnamed: 0,id,title,description,category,start_date,end_date,country,coords,rank,venue_name,formatted_address
0,TpCYmYhbA4RmmGjZio,Mount Claremont Farmers Market,The Mount Claremont Farmers' Market is an exci...,community,2021-07-23T23:30:00Z,2021-07-24T03:30:00Z,AU,"[115.782705, -31.968148]",45,no venue,no address
1,ZuwQHdYq85DxzPdyB2,Margaret River Farmers Market,The Margaret River Farmers' Market has been a ...,community,2021-07-23T23:30:00Z,2021-07-24T03:30:00Z,AU,"[115.073923, -33.957078]",45,no venue,no address
2,u6rJBcDTSLvdZC8NfF,Doll Fair,,expos,2021-07-23T23:30:00Z,2021-07-25T02:00:00Z,AU,"[150.874911, -33.929269]",62,no venue,424-458 Hoxton Park Road\nLiverpool West NSW 2...
3,5FyorXaGSq9v4pbCZ9,Benalla Farmers' Market,The Benalla Farmers Market has plenty of stall...,community,2021-07-23T23:00:00Z,2021-07-24T03:00:00Z,AU,"[145.983564, -36.552345]",45,no venue,no address
4,66Rm432Hr5Dq8KYfwW,The National 2021 at AGNSW,14 artist projects at the Art Gallery of New S...,expos,2021-07-23T23:00:00Z,2021-07-24T07:00:00Z,AU,"[151.214894, -33.871516]",45,no venue,no address
...,...,...,...,...,...,...,...,...,...,...,...
45,fx3JfhhiqoysxrXR8g,Learn to make cocktails!,"Learn to make a variety of cocktails, with gui...",community,2021-07-15T08:00:00Z,2021-07-15T11:00:00Z,AU,"[151.213768, -33.877746]",0,no venue,no address
46,hAUz3PzJrRsK4FsNZG,Absolute Beginners Bachata Classes,Never danced before? Not a problem! Tropical S...,community,2021-07-15T08:00:00Z,2021-07-15T08:30:00Z,AU,"[151.164101, -33.887906]",0,no venue,no address
47,hhF9KV2dDyP5c2Vdr8,Evening Adult Art Classes,Learn to see as an artist and develop you skil...,community,2021-07-15T08:00:00Z,2021-07-15T09:30:00Z,AU,"[152.921212, -29.671462]",0,no venue,no address
48,x7LXaqKZxNVGKYWo94,HSC Up Close,Jesse Whelan from Sandbox Learning Australia w...,conferences,2021-07-15T08:00:00Z,2021-07-15T09:30:00Z,AU,"[151.244995, -33.89296]",0,no venue,32 48 Denison Street\nBondi Junction NSW 2022\...


In [8]:
events_entries_df.columns

Index(['id', 'title', 'description', 'category', 'start_date', 'end_date',
       'country', 'coords', 'rank', 'venue_name', 'formatted_address'],
      dtype='object')

In [9]:
# Check dtypes
events_entries_df.dtypes

id                   object
title                object
description          object
category             object
start_date           object
end_date             object
country              object
coords               object
rank                  int64
venue_name           object
formatted_address    object
dtype: object

In [10]:
# Analyse the category types and entries within each category.
events_entries_df['category'].value_counts()

community          486
expos              421
sports             293
performing-arts    178
festivals           49
conferences         35
concerts            34
school-holidays      2
observances          2
Name: category, dtype: int64

# LOAD

In [None]:
# Connect to local database
# for everyone else
rds_connection_string = "postgres:postgres@localhost:5432/events_db"
engine = create_engine(f'postgresql://{rds_connection_string}')

In [11]:
# Diane's use only
# rds_connection_string = "postgres:309Malanday!@localhost:5432/events_db"
# engine = create_engine(f'postgresql://{rds_connection_string}')

In [12]:
# events_entries_df.to_sql(name='events_table', con=engine, if_exists='append', index=False)

IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "events_table_pkey"
DETAIL:  Key (id)=(TpCYmYhbA4RmmGjZio) already exists.

[SQL: INSERT INTO events_table (id, title, description, category, start_date, end_date, country, coords, rank, venue_name, formatted_address) VALUES (%(id)s, %(title)s, %(description)s, %(category)s, %(start_date)s, %(end_date)s, %(country)s, %(coords)s, %(rank)s, %(venue_name)s, %(formatted_address)s)]
[parameters: ({'id': 'TpCYmYhbA4RmmGjZio', 'title': 'Mount Claremont Farmers Market', 'description': "The Mount Claremont Farmers' Market is an exciting initiative. Predominantly a fresh produce market operating every Saturday morning at the Mount Cla ... (1011 characters truncated) ... ic local olive oils and condiments to choose from.\r\n\r\n\n\nOpen Times: 7.30am to 11.30am every Saturday. Closed the Saturday after Christmas only.", 'category': 'community', 'start_date': '2021-07-23T23:30:00Z', 'end_date': '2021-07-24T03:30:00Z', 'country': 'AU', 'coords': [115.782705, -31.968148], 'rank': 45, 'venue_name': 'no venue', 'formatted_address': 'no address'}, {'id': 'ZuwQHdYq85DxzPdyB2', 'title': 'Margaret River Farmers Market', 'description': "The Margaret River Farmers' Market has been a feature of the region for over 10 years, providing access to fresh local produce for both residents and ... (904 characters truncated) ... River Regional Producers' Association, a not for profit association run by a volunteer board.\r\n\r\n\n\nOpen Times: 8.00am to 12.00pm each Saturday.", 'category': 'community', 'start_date': '2021-07-23T23:30:00Z', 'end_date': '2021-07-24T03:30:00Z', 'country': 'AU', 'coords': [115.073923, -33.957078], 'rank': 45, 'venue_name': 'no venue', 'formatted_address': 'no address'}, {'id': 'u6rJBcDTSLvdZC8NfF', 'title': 'Doll Fair', 'description': '', 'category': 'expos', 'start_date': '2021-07-23T23:30:00Z', 'end_date': '2021-07-25T02:00:00Z', 'country': 'AU', 'coords': [150.874911, -33.929269], 'rank': 62, 'venue_name': 'no venue', 'formatted_address': '424-458 Hoxton Park Road\nLiverpool West NSW 2170\nAustralia'}, {'id': '5FyorXaGSq9v4pbCZ9', 'title': "Benalla Farmers' Market", 'description': 'The Benalla Farmers Market has plenty of stalls, food and entertainment for the whole family. Held beside beautiful Lake Benalla on the fourth Saturd ... (139 characters truncated) ...  coffee, try and wine and a delicious snack and listen to a diverse range of music performed by local talent.\n\nCheck the Facebook page for updates.', 'category': 'community', 'start_date': '2021-07-23T23:00:00Z', 'end_date': '2021-07-24T03:00:00Z', 'country': 'AU', 'coords': [145.983564, -36.552345], 'rank': 45, 'venue_name': 'no venue', 'formatted_address': 'no address'}, {'id': '66Rm432Hr5Dq8KYfwW', 'title': 'The National 2021 at AGNSW', 'description': '14 artist projects at the Art Gallery of New South Wales explore the potential of art to heal and care for fragile natural and social ecosystems. In  ... (1014 characters truncated) ... rt that together highlight many of the ideas and concerns motivating artists in Australia today.\n\nFor more information visit the dedicated website.', 'category': 'expos', 'start_date': '2021-07-23T23:00:00Z', 'end_date': '2021-07-24T07:00:00Z', 'country': 'AU', 'coords': [151.214894, -33.871516], 'rank': 45, 'venue_name': 'no venue', 'formatted_address': 'no address'}, {'id': '75nCaoHuQvSKbEWSYi', 'title': 'PlayUP: The Right to Have an Opinion and Be Heard', 'description': "PlayUP is the Museum of Australian Democracy's imaginative family space with activities to educate, excite and challenge young minds.\n\nWith listeni ... (120 characters truncated) ...  and immersive experiences that flip the traditional idea of museums completely on its head.\n\nBookings are essential.\n\nOpen Times: Daily: 9am-5pm", 'category': 'expos', 'start_date': '2021-07-23T23:00:00Z', 'end_date': '2021-07-24T07:00:00Z', 'country': 'AU', 'coords': [149.129768, -35.301112], 'rank': 45, 'venue_name': 'no venue', 'formatted_address': 'no address'}, {'id': '8ctfxaAbJZtFeQ649w', 'title': 'Happy and Glorious', 'description': 'The Museum of Australian Democracy (MoAD) at Old Parliament House presents Happy and Glorious - an exhibition dedicated to The Queen’s Visit in 1954. ... (560 characters truncated) ... jects.\n\nVisit this snapshot in time at the Happy and Glorious exhibition by booking a general museum pass to MoAD.\n\nOpen Times: Daily: 9am to 5pm', 'category': 'expos', 'start_date': '2021-07-23T23:00:00Z', 'end_date': '2021-07-24T07:00:00Z', 'country': 'AU', 'coords': [149.129768, -35.301112], 'rank': 45, 'venue_name': 'no venue', 'formatted_address': 'no address'}, {'id': '9WnZoMzU7X9wteMhyB', 'title': 'Private Flower Crown Workshop', 'description': 'Flower crown workshop is on trend. If you are searching for a unique and creative activity for your special day, it is the perfect choice. Making a f ... (473 characters truncated) ... suit your needs. Meet your flower expert and create gorgeous floral crowns with them.  They’d love to chat with you and host a special party for you.', 'category': 'community', 'start_date': '2021-07-23T23:00:00Z', 'end_date': '2021-07-24T08:00:00Z', 'country': 'AU', 'coords': [151.199999, -33.91609], 'rank': 0, 'venue_name': 'no venue', 'formatted_address': 'no address'}  ... displaying 10 of 1500 total bound parameter sets ...  {'id': 'x7LXaqKZxNVGKYWo94', 'title': 'HSC Up Close', 'description': 'Jesse Whelan from Sandbox Learning Australia will outline the mindset, habits, and learning techniques needed to achieve your potential.', 'category': 'conferences', 'start_date': '2021-07-15T08:00:00Z', 'end_date': '2021-07-15T09:30:00Z', 'country': 'AU', 'coords': [151.244995, -33.89296], 'rank': 0, 'venue_name': 'no venue', 'formatted_address': '32 48 Denison Street\nBondi Junction NSW 2022\nAustralia'}, {'id': 'x8tFddmghJwsbCrUbX', 'title': 'EQUINOX TOWNSVILLE 2021', 'description': 'The boutique architectural exhibition connecting design professionals with industry-leading products, technologies and suppliers.', 'category': 'expos', 'start_date': '2021-07-15T08:00:00Z', 'end_date': '2021-07-15T10:00:00Z', 'country': 'AU', 'coords': [146.82597, -19.25388], 'rank': 0, 'venue_name': 'no venue', 'formatted_address': 'Sir Leslie Thiess Dr\nTownsville QLD 4810\nAustralia'})]
(Background on this error at: http://sqlalche.me/e/13/gkpj)

In [13]:
# Create a events_db. Add the tables into postgres database as per schema.sql
engine.table_names()

['events_table']

In [14]:
events_info = pd.read_sql_table('events_table', engine) 
events_info

Unnamed: 0,id,title,description,category,start_date,end_date,country,coords,rank,venue_name,formatted_address
0,TpCYmYhbA4RmmGjZio,Mount Claremont Farmers Market,The Mount Claremont Farmers' Market is an exci...,community,2021-07-23 23:30:00+00:00,2021-07-24 03:30:00+00:00,AU,"{115.782705,-31.968148}",45,no venue,no address
1,ZuwQHdYq85DxzPdyB2,Margaret River Farmers Market,The Margaret River Farmers' Market has been a ...,community,2021-07-23 23:30:00+00:00,2021-07-24 03:30:00+00:00,AU,"{115.073923,-33.957078}",45,no venue,no address
2,u6rJBcDTSLvdZC8NfF,Doll Fair,,expos,2021-07-23 23:30:00+00:00,2021-07-25 02:00:00+00:00,AU,"{150.874911,-33.929269}",62,no venue,424-458 Hoxton Park Road\nLiverpool West NSW 2...
3,5FyorXaGSq9v4pbCZ9,Benalla Farmers' Market,The Benalla Farmers Market has plenty of stall...,community,2021-07-23 23:00:00+00:00,2021-07-24 03:00:00+00:00,AU,"{145.983564,-36.552345}",45,no venue,no address
4,66Rm432Hr5Dq8KYfwW,The National 2021 at AGNSW,14 artist projects at the Art Gallery of New S...,expos,2021-07-23 23:00:00+00:00,2021-07-24 07:00:00+00:00,AU,"{151.214894,-33.871516}",45,no venue,no address
...,...,...,...,...,...,...,...,...,...,...,...
1495,fx3JfhhiqoysxrXR8g,Learn to make cocktails!,"Learn to make a variety of cocktails, with gui...",community,2021-07-15 08:00:00+00:00,2021-07-15 11:00:00+00:00,AU,"{151.213768,-33.877746}",0,no venue,no address
1496,hAUz3PzJrRsK4FsNZG,Absolute Beginners Bachata Classes,Never danced before? Not a problem! Tropical S...,community,2021-07-15 08:00:00+00:00,2021-07-15 08:30:00+00:00,AU,"{151.164101,-33.887906}",0,no venue,no address
1497,hhF9KV2dDyP5c2Vdr8,Evening Adult Art Classes,Learn to see as an artist and develop you skil...,community,2021-07-15 08:00:00+00:00,2021-07-15 09:30:00+00:00,AU,"{152.921212,-29.671462}",0,no venue,no address
1498,x7LXaqKZxNVGKYWo94,HSC Up Close,Jesse Whelan from Sandbox Learning Australia w...,conferences,2021-07-15 08:00:00+00:00,2021-07-15 09:30:00+00:00,AU,"{151.244995,-33.89296}",0,no venue,32 48 Denison Street\nBondi Junction NSW 2022\...
