# Map data visualisation

This file will be used to visualize the geolocation in various ways 

Firstly we import the python packages that will be relevant for this project

### Imports 

In [6]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import re #for regular expression and data cleaning
import geotext #for extracting location from text
import geopy #for geocoding
import folium #for map visualization

now we need to extract ther data

### Data extraction

In [7]:
data = pd.read_csv("./archive/tweets.csv", nrows=10000)
data.head()

Unnamed: 0,date,content,hashtags,like_count,rt_count,followers_count,isVerified,language,coordinates,place,source
0,2023-02-21 03:30:04+00:00,तुर्की में सोमवार देर रात भूंकप के तेज झटके मह...,"['ATDigital', 'Turkey', 'Earthquake', 'TurkeyE...",0,0,19727712,True,hi,,,Twitter Media Studio
1,2023-02-21 03:29:07+00:00,New search &amp; rescue work is in progress in...,"['Hatay', 'earthquakes', 'Türkiye', 'TurkiyeQu...",1,0,5697,True,en,,,Twitter Web App
2,2023-02-21 03:29:04+00:00,Can't imagine those who still haven't recovere...,"['Turkey', 'earthquake', 'turkeyearthquake2023...",0,0,1,False,en,,,Twitter for Android
3,2023-02-21 03:28:06+00:00,its a highkey sign for all of us to ponder ove...,"['turkeyearthquake2023', 'earthquake', 'Syria']",0,0,3,False,en,,,Twitter for Android
4,2023-02-21 03:27:38+00:00,Turkiye Earthquake: तुर्किए में फिर आया भूकंप ...,"['turkey', 'earthquake', 'turkiye', 'india', '...",0,0,17,False,und,,,Twitter for Android


Now we would need to clean the data. We would need to remove the URLs, hashtags, mentions, and emojis. We would also need to remove the punctuations and convert the text to lowercase. 

### Data cleaning

In [8]:
def clean_text(text):
    text = re.sub(r'@[A-Za-z0-9]+', '', text) # remove mentions
    text = re.sub(r'#', '', text) # remove hashtags
    text = re.sub(r'\s+', ' ', text) # remove extra whitespace
    return text

data['content'] = data['content'].apply(clean_text)

# Transform the date column to datetime format
data['date'] = pd.to_datetime(data['date'])
data['date'] = data['date'].dt.strftime('%Y-%m-%d %H:%M:%S') # stolen from main file

data = data[data['language'] == 'en'] # only english tweets -- This to be deleated when we get more data




We now need to find where the tweets are and fill in the missing values. This can be done with natural language processing. We are trying to see what city or specific location the tweet is mentioned in it. 

### Location extraction

In [19]:
# Create a new column for location mentioned in text.
data['location'] = data['content'].apply(lambda x: geotext.GeoText(x).cities)



data['location'] = data['location'].apply(lambda x: x[0] if len(x) > 0 else None)

# Create a new column with the distance from location after finding *km or *miles in text.
data['distance'] = data['content'].apply(lambda x: re.findall(r'\d+km|\d+miles', x))

# Extract the number from the distance column and convert it to float.
data['distance'] = data['distance'].apply(lambda x: float(re.findall(r'\d+', x[0])[0]) if len(x) > 0 else None)

# Print the amount of tweets with location and distance.
print(f"Amount of tweets with location: {len(data[data['location'].notnull()])}")
print(f"Amount of tweets with distance: {len(data[data['distance'].notnull()])}")

print(data[data["location"].notnull()])



Amount of tweets with location: 1315
Amount of tweets with distance: 31
                     date                                            content  \
22    2023-02-21 03:19:42  Monday’s earthquake, this time with a magnitud...   
27    2023-02-21 03:14:38  Earthquake (deprem) M3.9 strikes 38 km SE of M...   
32    2023-02-21 03:10:46  Egypt sends aid ship to quake victims carrying...   
42    2023-02-21 03:02:13  Ankara: Earthquake tremors have been felt once...   
52    2023-02-21 02:55:17  🔔Earthquake (deprem) M2.9 occurred 20 km W of ...   
...                   ...                                                ...   
9962  2023-02-14 14:56:16  Big Kildare Interview: Turkish man in Naas hel...   
9963  2023-02-14 14:55:04  Earthquake (deprem) M2.7 strikes 21 km SW of E...   
9967  2023-02-14 14:52:02  Deadly earthquakes have left children in Syria...   
9998  2023-02-14 14:34:16  Earthquake 35 km W of Latakia (Syria) 50 min a...   
9999  2023-02-14 14:33:56  🔔Earthquake (زلزال) M

Now that we have some tweets with locations, we can plot them on a map. We will use the folium package to do this.

### Map plotting

In [25]:
map = folium.Map(location=[37.225, 37.021], zoom_start=6)

icon = folium.Icon(color="red", icon= "house-crack")

folium.Marker(location=[37.225, 37.021], icon=icon).add_to(map)
map
