# EXTRACTION
## Data Source
- Use Events API: https://api.predicthq.com/v1/events?
## Methodology
### Data Analysis - View Data
1. Convert to json format and read json. Determine which variables are useful and required for the deliverable.

FINDINGS: The API allows user to call for 50 entries/page, and up to 100 pages. i.e. Each time a call is made, there are only 50 entry results. 
### Data Analysis - Deetermine and Extract Data Required

- Number of entries required: 5000. 
    1. Loop and set offset at every 50 interval (0, 50, 100, 150 etc until it reaches 4950. 5000 results max from API.)
- Variables required: id, country, category, title, start_date, end_date, country, location
    2. Use pandas to filter for the variables required. 
        - *location is the coordinates

# TRANSFORMATION
- Concatenate the dataframes together (from all 100 pages) and save the final dataframe to a variable.

# LOAD
- Load data into PostSQL

-------------------------------

# EXTRACTION

In [4]:
# Dependencies
import requests
import pprint
import json
import requests
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize 
from sqlalchemy import create_engine

# Credential File: py_config.py containing variable ACCESS_TOKEN = "xxxxxxxxx"
import py_config

## Data Analysis - View Data
Convert to json format and read json. Determine which variables are useful and required for the deliverable.

In [5]:
# Connect to API url and get data
# Variable ACCESS_TOKEN is referenced in py_config.py file and is in list .gitignore
response = requests.get(
    url ="https://api.predicthq.com/v1/events?",
    headers={
      "Authorization": f"Bearer {py_config.ACCESS_TOKEN}",
      "Accept": "application/json"
    },
    # params={
    #     "limit": 50,
    params={
      "limit":10,
      "country": "AU",
      "start": "2021-01-01",
      "end": "2022-12-31"
    # }
    }
)

In [6]:
# Convert data to json format (all data)
# Save to variable "data1"
data = response.json()

# Print json (formatted) and analyse which variables to use for deliverable
print(json.dumps(data, indent=4, sort_keys=True))

are a monthly market in Jamberoo incorporating craft stalls alongside fresh food, fruit, and vegetables, local crafts, flowers, and homewares as well as live entertainment. \n\nThe markets operate on Sundays (the LAST Sunday of the month NOT the 4th) and are located on the roundabout at Reid Park on Allowrie Street in Jamberoo. \n\nPlenty of parking and facilities. \n\nJamberoo Markets are a dog-friendly (on leash) destination. \n\nA variety of stalls include flowers, leatherwork, clothing, plants, organic vegetables, sweets, knitwear, orchids, bread, food, and seasonal specialties. \n\nDelicious multicultural food and music from start to finish.\n\nDestination South Coast. 10 minutes drive from Kiama and Albion Park. \nOnly 1.5 hours south of Sydney. The perfect Sunday coast drive and escape.\n\nPlease check their Instagram site for weather updates and monthly cancellations before leaving home. If you have any doubts, please call either of our organizers to check the markets are on.",

In [7]:
data['results'][0]

{'relevance': 1.0,
 'id': 'iv5AMr7UmVxSghaa2N',
 'title': 'Todd Mall Markets',
 'description': 'The Todd Mall Market is a long established outdoor market located in the heart of the iconic town of Alice Springs. It has been a popular venue for tourists and locals for over 20 years.\n\nTodd Mall comes alive with the Todd Mall Markets every second Sunday from March to December. There are a wide range of stalls selling crafts, Aboriginal art, clothing, jewellery, books, food and more. The Todd Mall now has free Wifi.\n\nOpen Times: 9am to 1pm from October to April and 9am to 2pm from May to September.',
 'category': 'community',
 'labels': ['community'],
 'rank': 45,
 'local_rank': 74,
 'aviation_rank': None,
 'phq_attendance': 562,
 'entities': [{'entity_id': 'U7KwfB6QxNzv8k7uCeJLhy',
   'name': 'Todd Mall Markets',
   'type': 'event-group',
   'category': 'community',
   'labels': ['community', 'event-group', 'recurring']}],
 'duration': 14400,
 'start': '2021-07-24T23:30:00Z',
 'end': 

## Data Analysis - Determine and Extract Data Required
- Number of entries required: 5000. 
    1. Loop and set offset at every 50 interval (0, 50, 100, 150 etc until it reaches 4950. 5000 results max from API.)
- Variables required: id, country, category, title, start_date, end_date, country, location
    2. Use pandas to filter for the variables required. 
        - *location is the coordinates

In [8]:
# Test for loop 
for i in range (0,500,50):
    print(i)

0
50
100
150
200
250
300
350
400
450


In [9]:
# Create variable "entries" to store the list of dictionaries; 
# Each loop will contain a dictionary (as each pandas dataframe is a dictionaries). 
events_entries=[]

# Do a 'for loop' which loops from index 0 to 5000, at muliples of 50 (as API page can only call 50 entries max each time)
for i in range(0,1500,50):

    response = requests.get(
        url=f"https://api.predicthq.com/v1/events?offset={i}&limit=50",
        headers={
        "Authorization": f"Bearer {py_config.ACCESS_TOKEN}",
        "Accept": "application/json"
        },
        params={
            "country": "AU",
            "start": "2021-01-01",
            "end": "2022-12-31"
        }
    )

    # Save response to variable "data" and "data1"
    data = response.json()
    # data1 = data['results']
    # Save to variable "events_df" the dictionary (data> results) 
    events_df = pd.json_normalize(data, ['results'], errors='ignore')
    # print(events_df.head(3))

    def getEntitiesName(entities):
        try:
            return entities[0]['name']
        except:
            return 'no name'
    
    events_df['name'] = events_df.entities.apply(getEntitiesName)



    def getEntitiesAddress(entities):
        try:
            return entities[0]['formatted_address']
        except:
            return 'no address'
    events_df['formatted_address'] = events_df.entities.apply(getEntitiesAddress)


    def getEntitiesVenue(entities):
        try:
            return entities[1]['name']
        except IndexError:
            return 'no venue'
    events_df['venue_name'] = events_df.entities.apply(getEntitiesVenue)

    # Extract out only required variables (column headings)
    events_df = events_df[["id","title","description","category","start","end","country","location","rank","name","venue_name","formatted_address"]]
        
    events_entries.append(events_df)


    # TEST
    # print(events_entries)

# TRANSFORMATION
- Concatenate the dataframes together (from all 100 pages) and save the final dataframe to a variable.

In [10]:
# Concatenate all the dictionaries within list "events_entries". 
# i.e. Convert list to contain the data in the 1 dictionary.
# Save into variable "events_entries_df"
events_entries_df = pd.concat(events_entries)

# Rename columns
events_entries_df = events_entries_df.rename(columns={'start': 'start_date','end':'end_date','location':'coords','name':'title2'})
# events_entries_df

# Drop column title2
events_entries_df = events_entries_df.drop(columns={'title2'})
events_entries_df

Unnamed: 0,id,title,description,category,start_date,end_date,country,coords,rank,venue_name,formatted_address
0,iv5AMr7UmVxSghaa2N,Todd Mall Markets,The Todd Mall Market is a long established out...,community,2021-07-24T23:30:00Z,2021-07-25T03:30:00Z,AU,"[133.882699, -23.69983]",45,no venue,no address
1,6RzcLu26ny7ms5k9MY,Sandcastle Workshops for Children and Adults,Join Sandshapers on one of their sandcastle wo...,community,2021-07-24T23:00:00Z,2021-07-25T00:30:00Z,AU,"[153.101805, -26.406535]",0,no venue,no address
2,8LE7bwMRxtkFTPawec,Jamberoo Village Markets,Jamberoo Village Markets will be up and runnin...,community,2021-07-24T23:00:00Z,2021-07-25T04:00:00Z,AU,"[150.78307, -34.649118]",45,no venue,no address
3,ALEfzYAfeQL67jPR3g,Monthly Public Open Day,The Sydney Society of Model Engineers is the o...,community,2021-07-24T23:00:00Z,2021-07-25T06:30:00Z,AU,"[150.718767, -33.863444]",45,no venue,no address
4,AQPms3nYL5nmjh68Ee,Kingsbury Drive Community Market,The Kingsbury Drive Community Market has been ...,community,2021-07-24T23:00:00Z,2021-07-25T03:00:00Z,AU,"[145.049887, -37.726344]",45,no venue,no address
...,...,...,...,...,...,...,...,...,...,...,...
45,GJHDtWPgUkUW8QhKyc,G&S Engineering Wine and Food Day,Set under the subtle warmth of the North Queen...,festivals,2021-07-17T00:30:00Z,2021-07-17T07:30:00Z,AU,"[149.196606, -21.146868]",45,no venue,no address
46,KftGXBpKeYCHi9zBkq,Women of the River Country Exhibition,Follow the journey of these 19 remarkable wome...,expos,2021-07-17T00:30:00Z,2021-07-17T06:30:00Z,AU,"[139.316216, -34.910391]",45,no venue,no address
47,Knaz6KCdxuzvQrDTgv,Kalangadoo Farmers Market,Our weekly farmer's market has great benefits ...,community,2021-07-17T00:30:00Z,2021-07-17T03:30:00Z,AU,"[140.701655, -37.564235]",45,no venue,no address
48,a4jJMGLfcyHUoHcjSA,Northern NSW Reserve League - Newcastle Olympi...,,sports,2021-07-17T00:30:00Z,2021-07-17T00:30:00Z,AU,"[151.741019, -32.933956]",70,no venue,Hamilton South NSW 2303\nAustralia


In [11]:
events_entries_df.columns

Index(['id', 'title', 'description', 'category', 'start_date', 'end_date',
       'country', 'coords', 'rank', 'venue_name', 'formatted_address'],
      dtype='object')

In [12]:
# Check dtypes
events_entries_df.dtypes

id                   object
title                object
description          object
category             object
start_date           object
end_date             object
country              object
coords               object
rank                  int64
venue_name           object
formatted_address    object
dtype: object

In [13]:
# Analyse the category types and entries within each category.
events_entries_df['category'].value_counts()

community          458
sports             401
expos              363
performing-arts    167
festivals           42
conferences         36
concerts            29
observances          2
school-holidays      2
Name: category, dtype: int64

# LOAD

In [14]:
# Connect to local database
# for everyone else
rds_connection_string = "postgres:postgres@localhost:5432/events_db"
engine = create_engine(f'postgresql://{rds_connection_string}')

In [15]:
# Diane's use only
# rds_connection_string = "postgres:309Malanday!@localhost:5432/events_db"
# engine = create_engine(f'postgresql://{rds_connection_string}')

In [16]:
# events_entries_df.to_sql(name='events_table', con=engine, if_exists='append', index=False)

In [17]:
# Create a events_db. Add the tables into postgres database as per schema.sql
engine.table_names()

['events_table']

In [18]:
events_info = pd.read_sql_table('events_table', engine) 
events_info

Unnamed: 0,id,title,description,category,start_date,end_date,country,coords,rank,venue_name,formatted_address
0,8MpZhBcd3DTi98XpjA,Entertainment Fridays at Exchange Hotel Gawler,Fridays Live is back again - DJs playing track...,performing-arts,2021-07-22 23:30:00+00:00,2021-07-23 14:29:00+00:00,AU,"{138.749307,-34.596536}",45,no venue,no address
1,u6KMRTCBHZQ2EqxXq2,Charlton Badminton Club Ladies Tournament,The Charlton Badminton Club conducts an annual...,sports,2021-07-22 23:30:00+00:00,2021-07-23 05:00:00+00:00,AU,"{143.355924,-36.269452}",45,no venue,no address
2,5DCJjv8bUXro8FjRk5,Democracy. Are You In?,Democracy. Are You In? is a contemporary exhib...,expos,2021-07-22 23:00:00+00:00,2021-07-23 07:00:00+00:00,AU,"{149.129959,-35.30196}",45,Museum of Australian Democracy,no address
3,5MqdzFWEQag8zjmuzV,Education Show,,expos,2021-07-22 23:00:00+00:00,2021-07-24 08:00:00+00:00,AU,"{144.953111,-37.825394}",61,no venue,1 Convention Centre Place\nSouth Wharf VIC 300...
4,6s3gqBBLNqXUEYBHAb,PlayUP: The Right to Have an Opinion and Be Heard,PlayUP is the Museum of Australian Democracy's...,expos,2021-07-22 23:00:00+00:00,2021-07-23 07:00:00+00:00,AU,"{149.129768,-35.301112}",45,no venue,no address
...,...,...,...,...,...,...,...,...,...,...,...
1495,mDD6Pyyt2fy2NA3hGE,Bowen parkrun,"Parkrun is a free, weekly, timed five kilometr...",sports,2021-07-13 21:00:00+00:00,2021-07-13 22:30:00+00:00,AU,"{148.252191,-19.986951}",45,no venue,no address
1496,6goKpCode6wsHBgxg7,Making Meditation Mainstream Free Beach Medita...,Making Meditation Mainstream is a community mo...,community,2021-07-13 20:30:00+00:00,2021-07-13 21:00:00+00:00,AU,"{153.120592,-26.680521}",45,no venue,no address
1497,7DDBUNp6j3VhR8pnEU,Absolute Beginners Salsa Classes,Never danced before? Not a problem! Tropical S...,community,2021-07-13 20:00:00+00:00,2021-07-13 20:30:00+00:00,AU,"{151.164101,-33.887906}",0,no venue,no address
1498,6sfpWP6VN8duPsMCnU,Screen Coach | Acting For Screen Class,Want to learn a new creative skill or develop ...,community,2021-07-13 17:30:00+00:00,2021-07-14 08:30:00+00:00,AU,"{151.27134,-33.900037}",0,no venue,no address


In [20]:
events_info['coords']

0       {138.749307,-34.596536}
1       {143.355924,-36.269452}
2        {149.129959,-35.30196}
3       {144.953111,-37.825394}
4       {149.129768,-35.301112}
                 ...           
1495    {148.252191,-19.986951}
1496    {153.120592,-26.680521}
1497    {151.164101,-33.887906}
1498     {151.27134,-33.900037}
1499     {153.40433,-27.923776}
Name: coords, Length: 1500, dtype: object

In [23]:
# Removing '{' and '}' from 'coords' column
events_info['coords'] = events_info.coords.str.lstrip('{')
events_info['coords'] = events_info.coords.str.rstrip('}')

In [24]:
events_info['coords']

0       138.749307,-34.596536
1       143.355924,-36.269452
2        149.129959,-35.30196
3       144.953111,-37.825394
4       149.129768,-35.301112
                ...          
1495    148.252191,-19.986951
1496    153.120592,-26.680521
1497    151.164101,-33.887906
1498     151.27134,-33.900037
1499     153.40433,-27.923776
Name: coords, Length: 1500, dtype: object

In [25]:
# Final check
events_info

Unnamed: 0,id,title,description,category,start_date,end_date,country,coords,rank,venue_name,formatted_address
0,8MpZhBcd3DTi98XpjA,Entertainment Fridays at Exchange Hotel Gawler,Fridays Live is back again - DJs playing track...,performing-arts,2021-07-22 23:30:00+00:00,2021-07-23 14:29:00+00:00,AU,"138.749307,-34.596536",45,no venue,no address
1,u6KMRTCBHZQ2EqxXq2,Charlton Badminton Club Ladies Tournament,The Charlton Badminton Club conducts an annual...,sports,2021-07-22 23:30:00+00:00,2021-07-23 05:00:00+00:00,AU,"143.355924,-36.269452",45,no venue,no address
2,5DCJjv8bUXro8FjRk5,Democracy. Are You In?,Democracy. Are You In? is a contemporary exhib...,expos,2021-07-22 23:00:00+00:00,2021-07-23 07:00:00+00:00,AU,"149.129959,-35.30196",45,Museum of Australian Democracy,no address
3,5MqdzFWEQag8zjmuzV,Education Show,,expos,2021-07-22 23:00:00+00:00,2021-07-24 08:00:00+00:00,AU,"144.953111,-37.825394",61,no venue,1 Convention Centre Place\nSouth Wharf VIC 300...
4,6s3gqBBLNqXUEYBHAb,PlayUP: The Right to Have an Opinion and Be Heard,PlayUP is the Museum of Australian Democracy's...,expos,2021-07-22 23:00:00+00:00,2021-07-23 07:00:00+00:00,AU,"149.129768,-35.301112",45,no venue,no address
...,...,...,...,...,...,...,...,...,...,...,...
1495,mDD6Pyyt2fy2NA3hGE,Bowen parkrun,"Parkrun is a free, weekly, timed five kilometr...",sports,2021-07-13 21:00:00+00:00,2021-07-13 22:30:00+00:00,AU,"148.252191,-19.986951",45,no venue,no address
1496,6goKpCode6wsHBgxg7,Making Meditation Mainstream Free Beach Medita...,Making Meditation Mainstream is a community mo...,community,2021-07-13 20:30:00+00:00,2021-07-13 21:00:00+00:00,AU,"153.120592,-26.680521",45,no venue,no address
1497,7DDBUNp6j3VhR8pnEU,Absolute Beginners Salsa Classes,Never danced before? Not a problem! Tropical S...,community,2021-07-13 20:00:00+00:00,2021-07-13 20:30:00+00:00,AU,"151.164101,-33.887906",0,no venue,no address
1498,6sfpWP6VN8duPsMCnU,Screen Coach | Acting For Screen Class,Want to learn a new creative skill or develop ...,community,2021-07-13 17:30:00+00:00,2021-07-14 08:30:00+00:00,AU,"151.27134,-33.900037",0,no venue,no address
