# Map data visualisation

This file will be used to visualize the geolocation in various ways 

Firstly we import the python packages that will be relevant for this project

### Imports 

In [4]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import re #for regular expression and data cleaning
import geotext #for extracting location from text
from geopy.geocoders import Nominatim #for geocoding
import folium #for map visualization

now we need to extract ther data

### Data extraction

In [2]:
data = pd.read_csv("./archive/translated_dataset2.csv",)
data.head()

Unnamed: 0,date,content,hashtags,like_count,rt_count,followers_count,isVerified,language,coordinates,place,source,city_mention,translated_content
0,2023-02-20 23:37:52+00:00,Hatay'da 6.4 şiddetinde meydana gelen deprem A...,"['Uyarı', 'depremoldu', 'Batman', 'Adana', 'Ma...",5.0,0.0,881.0,False,tr,,,Twitter for Android,"['Adana', 'Batman', 'Adana']","In Hatay, the earthquake occurred in the sever..."
1,2023-02-20 23:17:39+00:00,"#Deprem Bu akşam #hatay da saat 20:04’de 6,4 b...","['Deprem', 'hatay', 'İdlib', 'Earthquake', 'Id...",1.0,0.0,90.0,False,tr,,,Twitter Web App,['Idlib'],"#Deprem This evening #hatay had 6,4 earthquake..."
2,2023-02-20 21:55:15+00:00,Siz şakamısınız okullar kapanınca okullar açıl...,"['Antalya', 'earthquake', 'okullarkapansin', '...",1.0,0.0,2.0,False,tr,,,Twitter for Android,['Antalya'],"If you are hungry, schools are closed, schools..."
3,2023-02-20 21:54:18+00:00,Deprem bölgesi Antakya'da ilk depremden beri o...,"['earthquake', 'hataydeprem', 'turkey']",0.0,1.0,2034.0,False,tr,,,Twitter for Android,['Antakya'],Earthquake region has been so artifacturing si...
4,2023-02-20 21:11:49+00:00,#Afet #çadır'ına 5 #ETH istemekte nedir? Bi ne...,"['Afet', 'çadır', 'ETH', 'deprem', 'hatay', 'e...",1.0,0.0,17.0,False,tr,,,Twitter for Android,['Adana'],English What to ask 5#ETH to Afet #çadur? When...


Now we would need to clean the data. We would need to remove the URLs, hashtags, mentions, and emojis. We would also need to remove the punctuations and convert the text to lowercase. 

### Data cleaning

In [3]:
def clean_text(text):
    text = re.sub(r'@[A-Za-z0-9]+', '', text) # remove mentions
    text = re.sub(r'#', '', text) # remove hashtags
    text = re.sub(r'\s+', ' ', text) # remove extra whitespace
    return text

data['translated_content'] = data['translated_content'].apply(clean_text)

# Transform the date column to datetime format
data['date'] = pd.to_datetime(data['date'])
data['date'] = data['date'].dt.strftime('%Y-%m-%d %H:%M:%S') # stolen from main file




We now need to find where the tweets are and fill in the missing values. This can be done with natural language processing. We are trying to see what city or specific location the tweet is mentioned in it. 

### Location extraction

In [4]:

# Create a new column with the distance from location after finding *km or *miles in text.
data['distance'] = data['translated_content'].apply(lambda x: re.findall(r'\d+km|\d+miles', x))

# Extract the number from the distance column and convert it to float.
data['distance'] = data['distance'].apply(lambda x: float(re.findall(r'\d+', x[0])[0]) if len(x) > 0 else None)

# Create a geolocator object with a custom user_agent
geolocator = Nominatim(user_agent="my-custom-user-agent")

# Define a function to get the coordinates of a location
def get_coordinates(row):
    try:
        # Use geolocator to get the location's coordinates
        location = geolocator.geocode(row['city_mention'])
        return pd.Series({'latitude': location.latitude, 'longitude': location.longitude})
    except:
        return pd.Series({'latitude': None, 'longitude': None})


# Remove duplicate city names within the array
seen_cities = set()
data['city_mention'] = data['city_mention'].apply(lambda x: x if x not in seen_cities else None if None else seen_cities.add(x) or x)


# Apply the get_coordinates function to the city column to create a new coordinates column
data[['latitude', 'longitude']] = data.apply(lambda row: get_coordinates(row), axis=1)

# Print the amount of tweets with location and distance.
print(f"Amount of tweets with coordinates: {len(data[data['latitude'].notnull()])}")
print(f"Amount of tweets with distance: {len(data[data['distance'].notnull()])}")

# save the data to a new csv file called "full_data.csv"
data.to_csv("full_data.csv", index=False)


Amount of tweets with coordinates: 24800
Amount of tweets with distance: 10


Now that we have some tweets with locations, we can plot them on a map. We will use the folium package to do this.

### Map plotting

In [5]:

data = pd.read_csv("full_data.csv")
map = folium.Map(location=[37.225, 37.021], zoom_start=6)

icon = folium.Icon(color="red", icon= "house-crack")

folium.Circle(location=[37.225, 37.021], radius=1000000, color="red", fill=True, fill_color="red").add_to(map)

folium.Marker(location=[37.225, 37.021], icon=icon).add_to(map)

data.apply(lambda row: folium.Marker(location=[row['latitude'], row['longitude']]).add_to(map) if pd.notnull(row['latitude']) and pd.notnull(row['longitude']) else None, axis=1)

map



