In [1]:
import pandas as pd

Import the data from "data/ufos/scrubbed.csv".

In [2]:
df = pd.read_csv("data/ufos/scrubbed.csv")

# Slicing and extracting data

Take a look at the `country` column.

In [5]:
df.country

0         us
1        NaN
2         gb
3         us
4         us
        ... 
80327     us
80328     us
80329     us
80330     us
80331     us
Name: country, Length: 80332, dtype: object

Check how many null (NaN) values there are in that column.

In [7]:
df.country.isnull().sum()
# Alternatively:
# df.country.isna().sum()

9670

Extract both the `country` and the `shape` column.

In [9]:
df[['country', 'shape']]

Unnamed: 0,country,shape
0,us,cylinder
1,,light
2,gb,circle
3,us,circle
4,us,light
...,...,...
80327,us,light
80328,us,circle
80329,us,other
80330,us,circle


Extract the second row of the dataframe (using `iloc`).

In [12]:
df.iloc[1]

datetime                                              1949-10-10 21:00:00
city                                                         lackland afb
state                                                                  tx
country                                                               NaN
shape                                                               light
duration (seconds)                                                 7200.0
duration (hours/min)                                              1-2 hrs
comments                1949 Lackland AFB&#44 TX.  Lights racing acros...
date posted                                                    12/16/2005
latitude                                                         29.38421
longitude                                                      -98.581082
Name: 1, dtype: object

Do the same but using `loc`.

In [15]:
df.loc[1]

datetime                                              1949-10-10 21:00:00
city                                                         lackland afb
state                                                                  tx
country                                                               NaN
shape                                                               light
duration (seconds)                                                 7200.0
duration (hours/min)                                              1-2 hrs
comments                1949 Lackland AFB&#44 TX.  Lights racing acros...
date posted                                                    12/16/2005
latitude                                                         29.38421
longitude                                                      -98.581082
Name: 1, dtype: object

Extract the first 3 rows using either `iloc` or `loc`.

In [17]:
df.iloc[:3]

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,1949-10-10 20:30:00,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
1,1949-10-10 21:00:00,lackland afb,tx,,light,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,1955-10-10 17:00:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667


In [18]:
df.loc[:3]

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,1949-10-10 20:30:00,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
1,1949-10-10 21:00:00,lackland afb,tx,,light,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,1955-10-10 17:00:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,1956-10-10 21:00:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833


# Conditional slicing

Extract all rows where the ufo was sighted in the us (using the `country` column).

In [20]:
df[df.country=="us"]

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,1949-10-10 20:30:00,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
3,1956-10-10 21:00:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833
4,1960-10-10 20:00:00,kaneohe,hi,us,light,900.0,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.418056,-157.803611
5,1961-10-10 19:00:00,bristol,tn,us,sphere,300.0,5 minutes,My father is now 89 my brother 52 the girl wit...,4/27/2007,36.595000,-82.188889
7,1965-10-10 23:45:00,norwalk,ct,us,disk,1200.0,20 minutes,A bright orange color changing to reddish colo...,10/2/1999,41.117500,-73.408333
...,...,...,...,...,...,...,...,...,...,...,...
80327,2013-09-09 21:15:00,nashville,tn,us,light,600.0,10 minutes,Round from the distance/slowly changing colors...,9/30/2013,36.165833,-86.784444
80328,2013-09-09 22:00:00,boise,id,us,circle,1200.0,20 minutes,Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r...,9/30/2013,43.613611,-116.202500
80329,2013-09-09 22:00:00,napa,ca,us,other,1200.0,hour,Napa UFO&#44,9/30/2013,38.297222,-122.284444
80330,2013-09-09 22:20:00,vienna,va,us,circle,5.0,5 seconds,Saw a five gold lit cicular craft moving fastl...,9/30/2013,38.901111,-77.265556


Extract all rows where the duraction was below 100 seconds.

In [22]:
df[df["duration (seconds)"] < 100]

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
2,1955-10-10 17:00:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.200000,-2.916667
3,1956-10-10 21:00:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833
14,1971-10-10 21:00:00,lexington,nc,us,oval,30.0,30 seconds,green oval shaped light over my local church&#...,2/14/2010,35.823889,-80.253611
18,1973-10-10 23:00:00,bermuda nas,,,light,20.0,20 sec.,saw fast moving blip on the radar scope thin w...,1/11/2002,32.364167,-64.678611
23,1976-10-10 20:30:00,washougal,wa,us,oval,60.0,1 minute,Three extremely large lights hanging above nea...,2/7/2014,45.582778,-122.352222
...,...,...,...,...,...,...,...,...,...,...,...
80321,2013-09-09 20:21:00,clarksville,tn,us,fireball,3.0,3 seconds,Green fireball like object shooting across the...,9/30/2013,36.529722,-87.359444
80322,2013-09-09 21:00:00,aleksandrow (poland),,,light,15.0,15 seconds,Two points of light following one another in a...,9/30/2013,50.465843,22.891814
80323,2013-09-09 21:00:00,gainesville,fl,us,triangle,60.0,1 minute,Three lights in the sky that didn&#39t look li...,9/30/2013,29.651389,-82.325000
80326,2013-09-09 21:00:00,woodstock,ga,us,sphere,20.0,20 seconds,Driving 575 at 21:00 hrs saw a white and green...,9/30/2013,34.101389,-84.519444


Bonus: Extract all sightings in the us and with a duration below 100 seconds.

In [23]:
df[(df["country"] == "us") & (df["duration (seconds)"] < 100)]

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
3,1956-10-10 21:00:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833
14,1971-10-10 21:00:00,lexington,nc,us,oval,30.0,30 seconds,green oval shaped light over my local church&#...,2/14/2010,35.823889,-80.253611
23,1976-10-10 20:30:00,washougal,wa,us,oval,60.0,1 minute,Three extremely large lights hanging above nea...,2/7/2014,45.582778,-122.352222
25,1977-10-10 12:00:00,san antonio,tx,us,other,30.0,30 seconds,i was about six or seven and my family and me ...,2/24/2005,29.423889,-98.493333
26,1977-10-10 22:00:00,louisville,ky,us,light,30.0,approx: 30 seconds,HBCCUFO CANADIAN REPORT: Pilot Sighting Of Un...,3/17/2004,38.254167,-85.759444
...,...,...,...,...,...,...,...,...,...,...,...
80320,2013-09-09 20:20:00,tuscaloosa,al,us,fireball,60.0,1:00,White/green object much larger than &quot;shoo...,9/30/2013,33.209722,-87.569167
80321,2013-09-09 20:21:00,clarksville,tn,us,fireball,3.0,3 seconds,Green fireball like object shooting across the...,9/30/2013,36.529722,-87.359444
80323,2013-09-09 21:00:00,gainesville,fl,us,triangle,60.0,1 minute,Three lights in the sky that didn&#39t look li...,9/30/2013,29.651389,-82.325000
80326,2013-09-09 21:00:00,woodstock,ga,us,sphere,20.0,20 seconds,Driving 575 at 21:00 hrs saw a white and green...,9/30/2013,34.101389,-84.519444


How many sighting were in the us?

In [24]:
(df.country == "us").sum()

65114

# Cleaning data

Drop all rows that have a null value in the `country` column.

In [25]:
df.dropna(subset=["country"])

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,1949-10-10 20:30:00,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
2,1955-10-10 17:00:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.200000,-2.916667
3,1956-10-10 21:00:00,edna,tx,us,circle,20.0,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.978333,-96.645833
4,1960-10-10 20:00:00,kaneohe,hi,us,light,900.0,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.418056,-157.803611
5,1961-10-10 19:00:00,bristol,tn,us,sphere,300.0,5 minutes,My father is now 89 my brother 52 the girl wit...,4/27/2007,36.595000,-82.188889
...,...,...,...,...,...,...,...,...,...,...,...
80327,2013-09-09 21:15:00,nashville,tn,us,light,600.0,10 minutes,Round from the distance/slowly changing colors...,9/30/2013,36.165833,-86.784444
80328,2013-09-09 22:00:00,boise,id,us,circle,1200.0,20 minutes,Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r...,9/30/2013,43.613611,-116.202500
80329,2013-09-09 22:00:00,napa,ca,us,other,1200.0,hour,Napa UFO&#44,9/30/2013,38.297222,-122.284444
80330,2013-09-09 22:20:00,vienna,va,us,circle,5.0,5 seconds,Saw a five gold lit cicular craft moving fastl...,9/30/2013,38.901111,-77.265556


# Renaming columns

Rename the `duration (seconds)` column to `duration`.

In [28]:
df = df.rename(columns={"duration (seconds)": "duration"})

Check if it worked.

In [None]:
df.columns

# Replacing values in specific columns

Replace all "us" entries in the `country` column with "united states".

In [30]:
df = df.replace({"country": {"us": "united states"}})

Check if it worked.

In [31]:
df.country

0        united states
1                  NaN
2                   gb
3        united states
4        united states
             ...      
80327    united states
80328    united states
80329    united states
80330    united states
80331    united states
Name: country, Length: 80332, dtype: object