# <center>🌎 ANALYSIS OF COVID-19  DATA</center>

The section consists of various section of geo analysis of data
## **Content**

1. [Data and library loading](#1)
2. [Visualizing and Understading of Data](#2)
3. [Preprocessing/Data Cleaning](#3)
4. [Data Viualization](#4)
    * [Valid Tweets](#5)
    * [Top 10 Countries with Most Tweets](#6)
    * [10 Countries with Least Tweets](#7)
    * [Top 15 Countries with Most Tweets Diffrent Representation](#8)
    * [Geo-MAP](#9)
10. [Conclusion](#10)

<a id="1"></a> <br>
# <div class="alert alert-block alert-info">Data and library loading</div>

In [None]:
#importing necessery libraries for future analysis of the dataset
!pip install calmap

from datetime import date
import os
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import geopandas as gpd
import geoplot
from geopy import Nominatim
import folium
import mapclassify
import plotly.express as px 
import plotly.graph_objs as go 
from mpl_toolkits.axes_grid1 import make_axes_locatable
from folium.plugins import HeatMapWithTime, TimestampedGeoJson
import matplotlib.style as style 
style.use('fivethirtyeight')
import numpy as np; np.random.seed(sum(map(ord, 'calmap')))
import pandas as pd
import calmap
from shapely.geometry import Polygon
from shapely.geometry import MultiPolygon
        
#Now Loading Tweetes Dataset 
covid_tweets_data = pd.read_csv('../input/covid19-tweets/covid19_tweets.csv')

<a id="2"></a> <br>
# <div class="alert alert-block alert-info">Visualizing and Understading of Data</div>

These tweets are collected using Twitter API and a Python script. A query for this high-frequency hashtag (#covid19) is run on a daily basis for a certain time period, to collect a larger number of tweets samples.

Content The tweets have #covid19 hashtag. Collection started on 25/7/2020, with an initial 17k batch and will continue on a daily basis.

* The collection script can be found here: https://github.com/gabrielpreda/covid-19-tweets

View Recentrly Imported Dataset

In [None]:
covid_tweets_data.head()

This Dataset Contains Following Columns and Datatypes

1. user_name         **(object)**
2. user_location     **(object)**
3. user_description  **(object)**
4. user_created      **(object)**
5. user_followers    **(int64)** 
6. user_friends      **(int64)** 
7. user_favourites   **(int64)** 
8. user_verified     **(bool)**  
9. date              **(object)**
10. text              **(object)**
11. hashtags          **(object)**
12. source            **(object)**
13. is_retweet        **(bool)**  

> dtypes: bool(2), int64(3), object(8)

In [None]:
nRow, nCol = covid_tweets_data.shape
print(f'There are {nRow} rows and {nCol} columns')

In [None]:
covid_tweets_data.info()

In [None]:
covid_tweets_data.describe()

<a id="3"></a> <br>
# <div class="alert alert-block alert-info">Preprocessing/Data Cleaning</div>

In [None]:
# World City Dataset

cities = pd.read_csv('../input/world-cities-datasets/worldcities.csv')

In [None]:
## Duplicate Location in Tweets Dataset

covid_tweets_data["location"] = covid_tweets_data["user_location"]
covid_tweets_data["country"] = np.NaN


# Removing Mising Values

In [None]:
user_location = covid_tweets_data['location'].fillna(value='').str.split(',')

# Feature Engineering(Countries Where Users Tweet)

In [None]:
lat = cities['lat'].fillna(value = '').values.tolist()
lng = cities['lng'].fillna(value = '').values.tolist()
country = cities['country'].fillna(value = '').values.tolist()

# Getting all alpha 3 codes into  a list
world_city_iso3 = []
for c in cities['iso3'].str.lower().str.strip().values.tolist():
    if c not in world_city_iso3:
        world_city_iso3.append(c)
        
# Getting all alpha 2 codes into  a list    
world_city_iso2 = []
for c in cities['iso2'].str.lower().str.strip().values.tolist():
    if c not in world_city_iso2:
        world_city_iso2.append(c)
        
# Getting all countries into  a list        
world_city_country = []
for c in cities['country'].str.lower().str.strip().values.tolist():
    if c not in world_city_country:
        world_city_country.append(c)

# Getting all amdin names into  a list
world_states = []
for c in cities['admin_name'].str.lower().str.strip().tolist():
    world_states.append(c)


# Getting all cities into  a list
world_city = cities['city'].fillna(value = '').str.lower().str.strip().values.tolist()



In [None]:

for each_loc in range(len(user_location)):
    ind = each_loc
    each_loc = user_location[each_loc]
    for each in each_loc:
        each = each.lower().strip()
        if each in world_city:
            order = world_city.index(each)
            covid_tweets_data['country'][ind] = country[order]
            continue
        if each in world_states:
            order= world_states.index(each)
            covid_tweets_data['country'][ind] = country[order]
            continue
        if each in world_city_country:
            order = world_city_country.index(each)
            covid_tweets_data['country'][ind] = world_city_country[order]
            continue
        if each in world_city_iso2:
            order = world_city_iso2.index(each)
            covid_tweets_data['country'][ind] = world_city_country[order]
            continue
        if each in world_city_iso3:
            order = world_city_iso3.index(each)
            covid_tweets_data['country'][ind] = world_city_country[order]
            continue


<a id="4"></a> <br>
# <div class="alert alert-block alert-info">Data visualizations</div>

**<a id="5">Valid Tweets</a>**

In [None]:
print('Total Number of valid Tweets Available: ',covid_tweets_data['country'].isnull().sum())

## **<a id="6">Top 10 Countries with Most Tweets</a>**

In [None]:
tweet_per_country = covid_tweets_data['country'].str.lower().dropna()
tw = tweet_per_country.value_counts().rename_axis('Country').reset_index(name='Tweet Count')
print(tw)
plt.rcParams['figure.figsize'] = (15,10)
plt.title('Top 10 Countries with Most Tweets',fontsize=15)
sns.set_palette("husl")
ax = sns.barplot(y=tw['Country'].head(10),x=tw['Tweet Count'].head(10))

## **<a id="7">10 Countries with Least Tweets</a>**

In [None]:
tweet_per_country = covid_tweets_data['country'].str.lower().dropna()
tw = tweet_per_country.value_counts().rename_axis('Country').reset_index(name='Tweet Count')
print(tw)
plt.rcParams['figure.figsize'] = (15,10)
plt.title('10 Countries with Least Tweets',fontsize=15)
sns.set_palette("husl")
ax = sns.barplot(y=tw['Country'][-9:],x=tw['Tweet Count'][-9:])

**Min and Max Dates Between The Dataset**

In [None]:
print (covid_tweets_data["date"].min())
print (covid_tweets_data["date"].max())

## **<a id="8">Top 15 Countries with Most Tweets Diffrent Representation</a>**

In [None]:
country_graph_03=px.bar(x='Tweet Count',y='Country',data_frame=tw[:15],color='Country')
country_graph_03.show()

## **<a id="9">Geo-MAP</a>**

In [None]:
geolocator = Nominatim(user_agent="covid19-application")

In [None]:
def visualize_Global_Corona_map(df,  zoom):
    
    lat_map=30.038557
    lon_map=31.231781
    f = folium.Figure(width=1000, height=500)
    m = folium.Map([lat_map,lon_map], zoom_start=zoom).add_to(f)
    print(df["Country"])
    for i in range(0,len(df)):
        t_country=str(df["Country"][i])
        location = geolocator.geocode(t_country)
        popup_text='<i>Location:'+t_country+', Tweets: '+str(df["Tweet Count"][i])+'</i>'
        folium.Marker(location=[location.latitude,location.longitude],popup=popup_text,icon=folium.Icon(icon_color='white',icon ='virus',prefix='fa')).add_to(m)
    
    return m

In [None]:
visualize_Global_Corona_map(tw, 1)

# Conclusion <a id="10"></a>
This concludes your Geographical Deep analysis! To go forward from here, click the blue "Fork Notebook" button at the top of this kernel. This will create a copy of the code and environment for you to edit. Delete, modify, and add code as you please. Happy Kaggling! For More Follow me or Give me a Star or contact me now at [Safdar Khan](https://www.safdarhan.ml) or [clikc Here to email me](mailto:safdarkhanofficial@gmail.com).