# Analysis of Twitter Data
## Geolocation and Interactive Maps
Geolocation is the process of identifying the geographic location of an object such as a mobile phone or a computer. 

Twitter allows its users to provide their location when they publish a tweet, in the form of latitude and longitude coordinates. With this information, we are ready to create some nice visualisation for our data, in the form of interactive maps.

This lab briefly introduces the GeoJSON format and **Leaflet.js**, a nice Javascript library for interactive maps, and discusses its integration with the Twitter data we have collected in the previous lab.

### GeoJSON

GeoJSON is a format for encoding geographic data structures. The format supports a variety of geometric types that can be used to visualise the desired shapes onto a map. For our examples, we just need the simplest structure, a Point. A point is identified by its coordinates (latitude and longitude).

In GeoJSON, we can also represent objects such as a Feature or a FeatureCollection. The first one is basically a geometry with additional properties, while the second one is a list of features.

Our Twitter data set can be represented in GeoJSON as a FeatureCollection, where each tweet would be an individual Feature with its one geometry (the aforementioned Point).

This is how the JSON structure looks like:

### From Tweets to GeoJSON

Assuming the tweet data has been downloaded into a single file as described in previous lab, we simply need to iterate all the tweets looking for the coordinates field, which may or may not be present. Keep in mind that you need to use coordinates, because the geo field is deprecated (see the API).

This code will read the data set, looking for tweets where the coordinates are explicitely given. Once the GeoJSON data structure is created (in the form of a Python dictionary), then the data are dumped into a file called geo_data.json:

In [2]:
import json
import sys

#fname = 'C:\\Program Files\\Anaconda2\\tweets_bigData_dataAnalytic.json'
fname = './all_tweet.json'

f = open(fname, 'r')
# f is the file pointer to the JSON data set
count = 1
tweets = []
for line in f:
    if (line == '\n'):      # skip empty lines
        continue
    count = count + 1
    if count%500 == 0:
        sys.stdout.write('.')
    if count%35000 == 0:
        sys.stdout.write('\n')
    tweet = json.loads(line)
    tweets.append(tweet)
print('\nDone..')

......................................................................
......................................................................
......................................................................
.................................
Done..


In [3]:
import pandas as pd
def populate_tweet_df(tweets):
    df = pd.DataFrame()
 
    df['text'] = list(map(lambda tweet: tweet['text'], tweets))
 
    df['location'] = list(map(lambda tweet: tweet['user']['location'], tweets))
 
    df['country_code'] = list(map(lambda tweet: tweet['place']['country_code']
                                  if tweet['place'] != None else '', tweets))
 
    df['long'] = list(map(lambda tweet: tweet['coordinates']['coordinates'][0]
                        if tweet['coordinates'] != None else 'NaN', tweets))
 
    df['latt'] = list(map(lambda tweet: tweet['coordinates']['coordinates'][1]
                        if tweet['coordinates'] != None else 'NaN', tweets))
 
    return df

df = populate_tweet_df(tweets)

#print(df[0:10])
print(df[df['country_code'] != ''])

                                                     text  location  \
25762   นมัสการค่ะ 🙏🏼🤣 @ โรงเรียนสามัคคีวิทยาทาน https...             
25764   เป็นครูและนักท่องเที่ยว 🙄 @ โรงเรียนสามัคคีวิท...             
25776   Just posted a photo @ บ้านรักไทย中國雲南 https://t...             
25835   Blue vibes . @ Chiang Mai Rajabhat University ...             
25909   Mission completed 💯ครั้งแรกของการสอนนักเรียนอย...             
26096                จริง55555555 https://t.co/OxRSrw0tHX             
26097                น้ำตาจิไหลลล https://t.co/eKmP7n0IKP             
26098    ทุกวันนี้กูคุยกับคนหรือคุยกับควายอ่ะ บางทีก็งง 😒             
26099                      เธอไม่ผิดหรอก เธอแค่เห็นแก่ตัว             
26102                                              #ปล่อย             
26105                                  มันจะจบจริงๆใช่ไหม             
26106                      ฝันว่าน้องเป็นเกย์ โคตรเหี้ย 😑             
26107          พีคคค5555555555555 https://t.co/GizXozGWMo             
26108 

In [4]:
import json

geo_data = {
    "type": "FeatureCollection",
    "features": []
}

for tweet in tweets:
    if tweet['coordinates']:
        geo_json_feature = {
            "type": "Feature",
            "geometry": tweet['coordinates'],
            "properties": {
                "text": tweet['text'],
                "created_at": tweet['created_at']
            }
        }
        geo_data['features'].append(geo_json_feature)

# Save geo data
with open('geo_data.json', 'w') as fout:
    fout.write(json.dumps(geo_data, indent=4))

### Interactive Maps with Leaflet.js

*Leaflet.js* is an open-source Javascript library for interactive maps. You can create maps with tiles of your choice (e.g. from OpenStreetMap or MapBox), and overlap interactive components.

At this point, the file *geo_data.json* will contain the geographic location of each tweet. A simple template to host a map is prepared as follows:

To run with the simple Python web server, do the following steps:
* Save the above HTML page as "geo_tweet.html" in your local directory
* Download the "geo_data.json" file into the same local directory
* Start "Anaconda Prompt", change to your local directory, and run the command "`python -m http.server 8889`" to start the Python web server
* Open your browser at http://localhost:8889/geo_tweet.html and observe the result