<a href="https://colab.research.google.com/github/bernaberb/BotAccidentesAviacion/blob/main/Aviation_Accidents_Bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[This is a bot](https://twitter.com/AirAccidentsBot) that uses data from plane crashes that occurred between 1908 and 2009, with a dataset obtained from [Kaggle](https://www.kaggle.com/saurograndi/airplane-crashes-since-1908).

I filtered that data set to keep the data that interests me and thus have the accidents that occurred between 1950 and 2009 with more than 30 fatalities.

With the data of the airline and flight number we use the [Wikipedia API](https://en.wikipedia.org/w/api.php) to obtain the link to the article.

We generate a text with all the information and it is tweeted using the [Twitter API](https://developer.twitter.com/en/docs/twitter-api) and [Tweepy](https://www.tweepy.org/) in [this Twitter account](https://twitter.com/AirAccidentsBot).

In [None]:
# Installing needed libraries
!pip install wikipedia
!pip install tweepy --upgrade

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11695 sha256=e038344c3e0b774c9e4e18a518ed5ad95d5dd877b0eb93821d74225684641438
  Stored in directory: /root/.cache/pip/wheels/15/93/6d/5b2c68b8a64c7a7a04947b4ed6d89fb557dcc6bc27d1d7f3ba
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0
Collecting tweepy
  Downloading tweepy-4.6.0-py2.py3-none-any.whl (69 kB)
[K     |████████████████████████████████| 69 kB 3.4 MB/s 
[?25hCollecting requests<3,>=2.27.0
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 595 kB/s 
Installing collected packages: requests, tweepy
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.

In [None]:
import pandas as pd 

from datetime import date, timedelta, datetime
from dateutil.relativedelta import relativedelta
from google.colab import drive

In [None]:
# Mounting dataset

df = pd.read_csv('../content/drive/MyDrive/Colab Notebooks/Mios Definitivos/Airplane_Crashes_and_Fatalities_Since_1908.csv')

In [None]:
# Getting familiar with data

df.sample(5)

Unnamed: 0,Date,Time,Location,Operator,Flight #,Route,Type,Registration,cn/In,Aboard,Fatalities,Ground,Summary
1817,11/30/1962,21:45,"New York, New York",Eastern Air Lines,512.0,Charlotte - New York City,Douglas DC-7B,N815D,45084/711,51.0,25.0,0.0,"The aircraft was 1,000 ft past the ILS touchdo..."
3561,06/23/1985,,"Juara, Brazil",TABA,,Juara - Cuiaba,Embraer 110P Bandeirante,PT-GJN,110-063,17.0,17.0,0.0,The plane stalled and crashed into an emergenc...
3320,12/05/1981,19:08,"Honolulu, Hawaii",Private - Parajump air show,,"Haliive, HI - Honolulu, HI",Beech C-45H,N8185H,AF-381,12.0,11.0,0.0,Crashed into the water. Improperly loaded airc...
3057,05/08/1978,21:20,"Pensacola, Florida",National Airlines,193.0,"Miami, FL - Pensacola, FL - Mobile, AL",Boeing B-727-235,N4744,19464,58.0,3.0,0.0,The aircraft crashed while attempting a non-pr...
1452,11/24/1956,23:17,"Paris, France",Linee Aeree Italiane,,Rome - Paris - Shannon - New York City,Douglas DC-6B,I-LEAD,45075,35.0,34.0,8.0,Lost altitude on takeoff and crashed into a ho...


In [None]:
# Deleting the columns that do not interest my purposes

df.drop(['Time', 'Type', 'Registration', 'cn/In', 'Aboard', 'Ground', 'Summary'], axis=1, inplace=True)


In [None]:
# Removing rows containing null values

df.dropna(inplace=True)

# Let's see how it looks now

df.sample(5)

Unnamed: 0,Date,Location,Operator,Flight #,Route,Fatalities
2266,11/22/1968,"San Francisco, California",Japan Air Lines,2,Tokyo - San Francisco,0.0
4357,08/21/1995,"Near Carrollton, GA",AtlantiSoutheast Airlines,529,Atlanta - Gulfport,10.0
4026,07/11/1991,"Jeddah, Saudi Arabia",Nationair (chartered by Nigeria Airways),2120,Jeddah - Sokoto,261.0
3663,01/03/1987,"Abidjan, Ivory Coast",Varig,797,Abidjan - Rio de Janeiro,50.0
4606,12/11/1998,"Near Surat Thani, Thailand",Thai Airways,261,Bangkok - Surat Thani,102.0


In [None]:
# Now I am interested in keeping only the entries that have more than 30 fatalities

df.drop(df.index[df['Fatalities'] < 30], inplace=True)

In [None]:
# 'Date' column info is string type, so I convert it to date type to add individual columns with month, day, and year.

df['Date'] = pd.to_datetime(df['Date'], format = '%m/%d/%Y')

df['Month'] = pd.DatetimeIndex(df['Date']).month
df['Day'] = pd.DatetimeIndex(df['Date']).day
df['Year'] = pd.DatetimeIndex(df['Date']).year

In [None]:
# Deleting entries before 1950

df.drop(df.index[df['Year'] < 1950], inplace=True)

In [None]:
# Saving today's date and month

currentDay = datetime.now().day
currentMonth = datetime.now().month

In [None]:
# Creating a filter to keep the entries whose day and month correspond to current.

isToday = (df['Day']==currentDay) & (df['Month']==currentMonth)

# Applying the filter

dfHoy = df[isToday]

# Let's see if there is any match for today

print(dfHoy)


           Date                                    Location  ... Day  Year
1897 1964-02-25  Lake Pontchartrain, New Orleans, Louisiana  ...  25  1964

[1 rows x 9 columns]


In [None]:
# Storing in variables all the information I need. If there is more than one entry for today, it keeps the first one.

year = dfHoy.iloc[0]['Year']
location = dfHoy.iloc[0]['Location']
operator = dfHoy.iloc[0]['Operator']
flight = dfHoy.iloc[0]['Flight #']
route = dfHoy.iloc[0]['Route']
fatalities = dfHoy.iloc[0]['Fatalities']
fatalities = fatalities.astype(int)


In [None]:
import wikipedia

# Looking for the wikipedia link based on the flight number and airline. It will bring the first match so it works fine. In case nothing is found, the link will be empty.

wikiSearch = operator + ' ' + flight
try:
  linkWiki = wikipedia.page(wikiSearch).url
except:
  linkWiki = ''
print(linkWiki)

https://en.wikipedia.org/wiki/Eastern_Air_Lines_Flight_304


In [None]:
# Creating the tweet

textTweet = ('On a day like today in ' + str(year) + ', ' + operator + ' ' + str(flight) + ' flight crashed near ' + location + ' while doing the route ' + route + ' causing ' + str(fatalities) + ' fatalities. ' + linkWiki)

print(textTweet)

On a day like today in 1964, Eastern Air Lines 304 flight crashed near Lake Pontchartrain, New Orleans, Louisiana while doing the route Mexico City - New Orleans - New York City causing 58 fatalities. https://en.wikipedia.org/wiki/Eastern_Air_Lines_Flight_304


In [None]:
import tweepy

# Now lets tweet it using Tweepy!

# Api Keys from Twitter

client = tweepy.Client(bearer_token='XXX')

client = tweepy.Client(consumer_key='XXX',
                       consumer_secret='XXX',
                       access_token='XXX',
                       access_token_secret='XXX')

# Tweeting!

response = client.create_tweet(text=textTweet)

print(response)


Unauthorized: ignored