<a href="https://colab.research.google.com/github/bernaberb/BotAccidentesAviacion/blob/main/Aviation_Accidents_Bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[This is a bot](https://twitter.com/AirAccidentsBot) that uses data from plane crashes that occurred between 1908 and 2009, with a dataset obtained from [Kaggle](https://www.kaggle.com/saurograndi/airplane-crashes-since-1908).

I filtered that data set to keep the data that interests me and thus have the accidents that occurred between 1950 and 2009 with more than 30 fatalities.

With the data of the airline and flight number we use the [Wikipedia API](https://en.wikipedia.org/w/api.php) to obtain the link to the article.

We generate a text with all the information and it is tweeted using the [Twitter API](https://developer.twitter.com/en/docs/twitter-api) and [Tweepy](https://www.tweepy.org/) in [this Twitter account](https://twitter.com/AirAccidentsBot).

In [None]:
# Installing needed libraries
!pip install wikipedia
!pip install tweepy --upgrade

In [2]:
import pandas as pd 

from datetime import date, timedelta, datetime
from dateutil.relativedelta import relativedelta
from google.colab import drive

In [3]:
# Mounting dataset

df = pd.read_csv('../content/drive/MyDrive/Colab Notebooks/Mios Definitivos/Airplane_Crashes_and_Fatalities_Since_1908.csv')

In [None]:
# Getting familiar with data

df.sample(5)

In [None]:
# Deleting the columns that do not interest my purposes

df.drop(['Time', 'Type', 'Registration', 'cn/In', 'Aboard', 'Ground', 'Summary'], axis=1, inplace=True)


In [None]:
# Removing rows containing null values

df.dropna(inplace=True)

# Let's see how it looks now

df.sample(5)

In [None]:
# Now I am interested in keeping only the entries that have more than 30 fatalities

df.drop(df.index[df['Fatalities'] < 30], inplace=True)

In [None]:
# 'Date' column info is string type, so I convert it to date type to add individual columns with month, day, and year.

df['Date'] = pd.to_datetime(df['Date'], format = '%m/%d/%Y')

df['Month'] = pd.DatetimeIndex(df['Date']).month
df['Day'] = pd.DatetimeIndex(df['Date']).day
df['Year'] = pd.DatetimeIndex(df['Date']).year

In [None]:
# Deleting entries before 1950

df.drop(df.index[df['Year'] < 1950], inplace=True)

In [None]:
# Saving today's date and month

currentDay = datetime.now().day
currentMonth = datetime.now().month

In [None]:
# Creating a filter to keep the entries whose day and month correspond to current.

isToday = (df['Day']==currentDay) & (df['Month']==currentMonth)

# Applying the filter

dfHoy = df[isToday]

# Let's see if there is any match for today

print(dfHoy)


In [None]:
# Storing in variables all the information I need. If there is more than one entry for today, it keeps the first one.

year = dfHoy.iloc[0]['Year']
location = dfHoy.iloc[0]['Location']
operator = dfHoy.iloc[0]['Operator']
flight = dfHoy.iloc[0]['Flight #']
route = dfHoy.iloc[0]['Route']
fatalities = dfHoy.iloc[0]['Fatalities']
fatalities = fatalities.astype(int)


In [None]:
import wikipedia

# Looking for the wikipedia link based on the flight number and airline. It will bring the first match so it works fine. In case nothing is found, the link will be empty.

wikiSearch = operator + ' ' + flight
try:
  linkWiki = wikipedia.page(wikiSearch).url
except:
  linkWiki = ''
print(linkWiki)

In [None]:
# Creating the tweet

textTweet = ('On a day like today in ' + str(year) + ', ' + operator + ' ' + str(flight) + ' flight crashed near ' + location + ' while doing the route ' + route + ' causing ' + str(fatalities) + ' fatalities. ' + linkWiki)

print(textTweet)

In [None]:
import tweepy

# Now lets tweet it using Tweepy!

# Api Keys from Twitter

client = tweepy.Client(bearer_token='XXX')

client = tweepy.Client(consumer_key='XXX',
                       consumer_secret='XXX',
                       access_token='XXX',
                       access_token_secret='XXX')

# Tweeting!

response = client.create_tweet(text=textTweet)

print(response)
