<a href="https://colab.research.google.com/github/chilezdengr/Connect-Sessions-on-Data-Wrangling-and-Visualisations/blob/main/session_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
import requests
import json

#### USING THE FOURSQUARE API

In [None]:
#We created a special credentials.json file to help us access the foursquare api
with open('credentials.json', 'r') as f:
    credentials = json.load(f)

In [None]:
#We can do this with a txt file but a json file is better
with open('credentials.txt', 'r') as f:
    my_txt_cred = f.readlines() 

In [None]:
url = "https://api.foursquare.com/v3/places/search"

headers = {
    "Accept": "application/json",
    "Authorization": credentials['key']
}

params = {"query":"food", 
                  "radius": 90000}

In [None]:
response = requests.request("GET", url, headers=headers, params=params)

In [None]:
if response.status_code==200:
    print('We are good')
else:
    print('ERROR!')

We are good


In [None]:
response.ok

True

In [None]:
data = response.json()

In [None]:
results = data['results']

In [None]:
len(results)

10

In [None]:
results[0]

{'fsq_id': '4d1aeef011fca0931a409ace',
 'categories': [{'id': 13064,
   'name': 'Pizzeria',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
    'suffix': '.png'}}],
 'chains': [],
 'distance': 47422,
 'geocodes': {'main': {'latitude': 12.007809, 'longitude': 8.552716}},
 'link': '/v3/places/4d1aeef011fca0931a409ace',
 'location': {'address': 'Hadejia road',
  'country': 'NG',
  'cross_street': '',
  'formatted_address': 'Hadejia road, Kano, Kano State',
  'locality': 'Kano',
  'region': 'Kano State'},
 'name': 'Pizza Shack',
 'related_places': {},
 'timezone': 'Africa/Lagos'}

In [None]:
df = pd.DataFrame(data['results'])

In [None]:
df['address'] = df['location'].apply(lambda x: x.get('address'))
df.head(2)

Unnamed: 0,fsq_id,categories,chains,distance,geocodes,link,location,name,related_places,timezone,address
0,4d1aeef011fca0931a409ace,"[{'id': 13064, 'name': 'Pizzeria', 'icon': {'p...",[],47422,"{'main': {'latitude': 12.007809, 'longitude': ...",/v3/places/4d1aeef011fca0931a409ace,"{'address': 'Hadejia road', 'country': 'NG', '...",Pizza Shack,{},Africa/Lagos,Hadejia road
1,51a61397498e613773d487d1,"[{'id': 13055, 'name': 'Fried Chicken Joint', ...",[],47624,"{'main': {'latitude': 12.006638, 'longitude': ...",/v3/places/51a61397498e613773d487d1,"{'address': 'MM way', 'country': 'NG', 'cross_...",Kano Fried Chicken (KFC),{},Africa/Lagos,MM way


In [None]:
df['location'][0]

{'address': 'Hadejia road',
 'country': 'NG',
 'cross_street': '',
 'formatted_address': 'Hadejia road, Kano, Kano State',
 'locality': 'Kano',
 'region': 'Kano State'}

### HOW TO DO JSON

We talk about JSON and how to use it. JSON stands for JavaScript Object Notation. JSON is a lightweight format for storing and transporting data. JSON files work like python dictionaries. With some easily notable exceptions:

- null inplace of None
- true/false inplace of True/False

In [None]:
import json

We use the data we imported from foursquare api data dataframe and we see that we have a dictionary

Below we can see the difference between these json functions
- load -> Loading in a json file
- loads->Have a string that is formatted as json and you want to turn that into a json object/dictionary
- dump->Write a json object to a json file
- dumps->Have a json object that you want to make into a string

In [None]:
type(df['location'][0])

dict

In [None]:
#json.dumps converts dictionary to a json object (string) with json.dumps()
json.dumps(df['location'][0])

'{"address": "Hadejia road", "country": "NG", "cross_street": "", "formatted_address": "Hadejia road, Kano, Kano State", "locality": "Kano", "region": "Kano State"}'

Below: 
- We convert the df['location'][0] dictionary to a json object (string) with json.dumps() 
- We convert this back to a dictionary with json.loads()

In [None]:
type(json.loads(json.dumps(df['location'][0])))

dict

In [None]:
#export as a json file
with open('testfile.json', 'w') as f:
    json.dump(df['location'][0], f)

We have exported the file as a json file and we can reimport that file

In [None]:
#import as a dictionary file
with open('testfile.json', 'r') as f:
    imported_file = json.load(f)
imported_file

In [None]:
{
    "id": 23,
    "status": "tweet a"
}

{'id': 23, 'status': 'tweet a'}

In [None]:
#importing out txt file
list_info = []
with open('tweet-json.txt') as f:
    for line in f:
        list_info.append(json.loads(line))

In [None]:
len(list_info)

2354

In [None]:
type(list_info), type(list_info[0])

(list, dict)

#### Data Type
When working with your data here are things you can ask yourself. You can check with the type() function
   1. What data type do we have? for the txt file: JSON-> key: value pair of each item  for many observations(a tweet)
   2. What data type do we want the end result?: Parsing or wrangling the data
            a. Pandas Dataframe
            b. List of Dictionaries
            
  

In [None]:
#alternately we can do 
data = []
with open('tweet-json.txt') as f:
    for line in f: 
        data.append(json.loads(line))

In [None]:
data = []
with open('tweet-json.txt') as f:
    tweet_info = f.readlines()
    for info in tweet_info:
        data.append(json.loads(info))

In [None]:
df = pd.DataFrame(data)
df.head(2)

Unnamed: 0,created_at,id,id_str,full_text,truncated,display_text_range,entities,extended_entities,source,in_reply_to_status_id,...,favorite_count,favorited,retweeted,possibly_sensitive,possibly_sensitive_appealable,lang,retweeted_status,quoted_status_id,quoted_status_id_str,quoted_status
0,Tue Aug 01 16:23:56 +0000 2017,892420643555336193,892420643555336193,This is Phineas. He's a mystical boy. Only eve...,False,"[0, 85]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 892420639486877696, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,39467,False,False,False,False,en,,,,
1,Tue Aug 01 00:17:27 +0000 2017,892177421306343426,892177421306343426,This is Tilly. She's just checking pup on you....,False,"[0, 138]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 892177413194625024, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,33819,False,False,False,False,en,,,,


In [None]:
#we look at the entity dictionary on the first row
first_obj = df['entities'].iloc[0]
first_obj

{'hashtags': [],
 'symbols': [],
 'user_mentions': [],
 'urls': [],
 'media': [{'id': 892420639486877696,
   'id_str': '892420639486877696',
   'indices': [86, 109],
   'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
   'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
   'url': 'https://t.co/MgUWQ76dJU',
   'display_url': 'pic.twitter.com/MgUWQ76dJU',
   'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
   'type': 'photo',
   'sizes': {'large': {'w': 540, 'h': 528, 'resize': 'fit'},
    'thumb': {'w': 150, 'h': 150, 'resize': 'crop'},
    'small': {'w': 540, 'h': 528, 'resize': 'fit'},
    'medium': {'w': 540, 'h': 528, 'resize': 'fit'}}}]}

In [None]:
first_obj.keys()

dict_keys(['hashtags', 'symbols', 'user_mentions', 'urls', 'media'])

In [None]:
#We try to extract the first object
first_obj['media'][0]['display_url']

'pic.twitter.com/MgUWQ76dJU'

**A little dictionary usage hint:**
Using .get() with a dictionary returns the value of the key or returns None if the key does not exist

In [None]:
our_obj = {'first':'a'}
my_ans = our_obj.get('second',[])
if my_ans is not None:
    print(my_ans)

[]


In [None]:
#this will throw an error because there is no second object
our_obj['second']

KeyError: 'second'

In [None]:
Instead of tranforming the data about with .get() and a lambda function, we can use a custom function with apply() and get() to get the information we need as below:

In [None]:
def mygetentities(x):
    media_dict = x.get('media')
    if media_dict is None:
        return None
    display_info = media_dict[0] 
    return display_info.get('display_url')

In [None]:
df['display_urls'] = df['entities'].apply(mygetentities) #df['entities'].apply(lambda x: x.get('media',[{}])[0].get('display_url'))
df.head()

Unnamed: 0,created_at,id,id_str,full_text,truncated,display_text_range,entities,extended_entities,source,in_reply_to_status_id,...,favorited,retweeted,possibly_sensitive,possibly_sensitive_appealable,lang,retweeted_status,quoted_status_id,quoted_status_id_str,quoted_status,display_urls
0,Tue Aug 01 16:23:56 +0000 2017,892420643555336193,892420643555336193,This is Phineas. He's a mystical boy. Only eve...,False,"[0, 85]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 892420639486877696, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,False,False,False,False,en,,,,,pic.twitter.com/MgUWQ76dJU
1,Tue Aug 01 00:17:27 +0000 2017,892177421306343426,892177421306343426,This is Tilly. She's just checking pup on you....,False,"[0, 138]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 892177413194625024, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,False,False,False,False,en,,,,,pic.twitter.com/0Xxu71qeIV
2,Mon Jul 31 00:18:03 +0000 2017,891815181378084864,891815181378084864,This is Archie. He is a rare Norwegian Pouncin...,False,"[0, 121]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 891815175371796480, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,False,False,False,False,en,,,,,pic.twitter.com/wUnZnhtVJB
3,Sun Jul 30 15:58:51 +0000 2017,891689557279858688,891689557279858688,This is Darla. She commenced a snooze mid meal...,False,"[0, 79]","{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 891689552724799489, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,False,False,False,False,en,,,,,pic.twitter.com/tD36da7qLQ
4,Sat Jul 29 16:00:24 +0000 2017,891327558926688256,891327558926688256,This is Franklin. He would like you to stop ca...,False,"[0, 138]","{'hashtags': [{'text': 'BarkWeek', 'indices': ...","{'media': [{'id': 891327551943041024, 'id_str'...","<a href=""http://twitter.com/download/iphone"" r...",,...,False,False,False,False,en,,,,,pic.twitter.com/AtUZn91f7f


### REGEX

Search the exact word like Hello -> This is called a String Literal
-    \* -> Matches a character which appears 0 or more characters
-  \+ -> Matches a character which appears 1 or more appearances
-  {0, 2} -> Matches a character which is occurs 1 or more times
-  ? -> Matches a character which appears 0 or 1 time
-  . -> Matches any character


-  \d -> Matches any digit
-  \w -> Matches any letter (alphabets .,;)
-  () -> Matches any block of characters
-  [a-z] or [A-Z] or [0-9] or [a-zA-Z] or [a-z0-9]->Matches a range of characters 
-  ^ a> ^ Matches any word/sentence that starts with the next character, in this case a
-  y$ -> Matches any word/sentence that ends with the previous character, in this case y

### USING REGEX

- toy
- tooooy
- toooooooy
- ty

A regex cheatsheat [Regex Cheatsheet](https://cheatography.com/davechild/cheat-sheets/regular-expressions/)

In [None]:
import re

In [None]:
sentence = 'It is raining today. It rained yesterday and will rain tomorrow'
ans = re.findall('^(It).*(today)$', sentence) #We find any sentence that starts with It and ends with today

In [None]:
if ans is None:
    print('Not Found')
else:
    print(ans)

[]


String parsing

In [None]:
#We can use basic string checks to find text in other strings
"Hello World" in "HelloWorld"

False

In [None]:
#We can also do this for lists
5 in [3,4,5,6]

True

In [None]:
#We can also do this for dictionaries. Here it will check if the string 'c' is in the dictionary keys
'c' in {'a':1,'b':2}

False

In [None]:
{'a':1,'b':2}.keys()

dict_keys(['a', 'b'])

In [None]:
{'a':1,'b':2}.values()

dict_values([1, 2])

In [None]:
{'a':1,'b':2}.items()

dict_items([('a', 1), ('b', 2)])

Regex use cases:
Find text patterns in a file
"HelloWorld", "Hello Woooorld", "Hello Woorld"

### USING REGEX IN A FUNCTION

So why is regex important? Regex is useful to string matching especially when you do not know what the sentence will be but have some rules/guidlines as to what it might be.

In [None]:
import re

In [None]:
def find_name(text):
    if text:
        sentence = text
        ans = re.search('^This (.)*is ([a-zA-Z])*\.', text)
        if ans:
            name_sent = sentence[ans.span()[0]:ans.span()[1]]
            return name_sent.split('is ')[-1].strip('.')
        ans = re.search('^(Meet) ([a-zA-Z])*', text)
        if ans:
            name_sent = sentence[ans.span()[0]:ans.span()[1]]
            return name_sent.split('Meet ')[-1]
    return None

In [None]:
find_name("This is Skittles. I would kidnap Skittles. Pink dog in back hasn't moved in days. 12/10 https://t.co/2wm0POA9N2")

'Skittles'

### Write data as text

In [None]:
#Write some data to a text file, note that this should be a string
test_string = 'This is some fun text.'
with open('unnamedtextfile.txt', 'w') as f:
    f.write(test_string)