# M1L7 Data Challenge:  Data Manipulation 

 We'll continue to work with UFO sighting data.

### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder 

### **Objectives:**

- Use string methods to manipulate data 
- Filter Data 
- Work more with dates in Python



**Let's get started!**

### Step 1:  Import Pandas & Numpy

In [1]:
# Import Pandas 
import pandas as pd 

### Step 2: Load the dataset (csv file stored in the data folder) into a Pandas DataFrame called `ufo`

- The file is callled `ufo-sightings.csv`


In [2]:
ufo = pd.read_csv('ufo-sightings.csv')


### Step 3: Explore the Data

Use any method(s) of your choice to look at the data and explore it 


In [3]:
ufo.dtypes

Unnamed: 0                       int64
Date_time                       object
date_documented                 object
Year                             int64
Month                            int64
Hour                             int64
Season                          object
Country_Code                    object
Country                         object
Region                          object
Locale                          object
latitude                       float64
longitude                      float64
UFO_shape                       object
length_of_encounter_seconds    float64
Encounter_Duration              object
Description                     object
dtype: object

In [4]:
ufo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80328 entries, 0 to 80327
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Unnamed: 0                   80328 non-null  int64  
 1   Date_time                    80328 non-null  object 
 2   date_documented              80328 non-null  object 
 3   Year                         80328 non-null  int64  
 4   Month                        80328 non-null  int64  
 5   Hour                         80328 non-null  int64  
 6   Season                       80328 non-null  object 
 7   Country_Code                 80069 non-null  object 
 8   Country                      80069 non-null  object 
 9   Region                       79762 non-null  object 
 10  Locale                       79871 non-null  object 
 11  latitude                     80328 non-null  float64
 12  longitude                    80328 non-null  float64
 13  UFO_shape       

### Step 4:  Clean the UFO_shape column 
- Make the column all uppercase 
- Strip off any leading and trailing spaces 

Even if there are no actual spaces; it is still good practice to trim off spaces even if you can't see space with the naked eye

Hint:  You will use both `str.upper()` and `str.strip()` -- you can do it in one step or two separate steps 

In [10]:
ufo_shape = ufo['UFO_shape'] = ufo['UFO_shape'].str.upper().str.strip()
print(ufo_shape)
ufo

0        CYLINDER
1           LIGHT
2          CIRCLE
3          CIRCLE
4           LIGHT
           ...   
80323       LIGHT
80324      CIRCLE
80325       OTHER
80326      CIRCLE
80327       CIGAR
Name: UFO_shape, Length: 80328, dtype: object


Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
0,0,1949-10-10 20:30:00,4/27/2004,1949,10,20,Autumn,USA,United States,Texas,San Marcos,29.883056,-97.941111,CYLINDER,2700.0,45 minutes,This event took place in early fall around 194...
1,1,1949-10-10 21:00:00,12/16/2005,1949,10,21,Autumn,USA,United States,Texas,Bexar County,29.384210,-98.581082,LIGHT,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...
2,2,1955-10-10 17:00:00,1/21/2008,1955,10,17,Autumn,GBR,United Kingdom,England,Chester,53.200000,-2.916667,CIRCLE,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...
3,3,1956-10-10 21:00:00,1/17/2004,1956,10,21,Autumn,USA,United States,Texas,Edna,28.978333,-96.645833,CIRCLE,20.0,1/2 hour,My older brother and twin sister were leaving ...
4,4,1960-10-10 20:00:00,1/22/2004,1960,10,20,Autumn,USA,United States,Hawaii,Kaneohe,21.418056,-157.803611,LIGHT,900.0,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80323,80323,2013-09-09 21:15:00,9/30/2013,2013,9,21,Autumn,USA,United States,Tennessee,Nashville,36.165833,-86.784444,LIGHT,600.0,10 minutes,Round from the distance/slowly changing colors...
80324,80324,2013-09-09 22:00:00,9/30/2013,2013,9,22,Autumn,USA,United States,Idaho,Boise,43.613611,-116.202500,CIRCLE,1200.0,20 minutes,Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r...
80325,80325,2013-09-09 22:00:00,9/30/2013,2013,9,22,Autumn,USA,United States,California,Napa Abajo,38.297222,-122.284444,OTHER,1200.0,hour,Napa UFO&#44
80326,80326,2013-09-09 22:20:00,9/30/2013,2013,9,22,Autumn,USA,United States,Virginia,Vienna,38.901111,-77.265556,CIRCLE,5.0,5 seconds,Saw a five gold lit cicular craft moving fastl...


### Step 5:  Use `pd.crosstab` to sum the number of shapes seen by season

- Add a comment of a main takeaway from the output 

In [39]:
shapes_by_season = pd.crosstab(ufo['UFO_shape'], ufo['Season']).sum()
shapes_by_season
    

Season
Autumn    21261
Spring    15799
Summer    25768
Winter    15570
dtype: int64

In [34]:
#shows total number of shapes for each season
season_totals = shapes_by_season.sum(axis = 0)
print(season_totals)

#shows total number of which shapes for each season
shapes_by_season['Total'] = shapes_by_season.sum(axis=1)
print(shapes_by_season)

#Add comment here:  
'''
Circle was the most common shape that appeared throughout all the seasons
There are a lot of unknown shapes
'''

Season
Autumn    21261
Spring    15799
Summer    25768
Winter    15570
dtype: int64
Season     Autumn  Spring  Summer  Winter  Total
UFO_shape                                       
CHANGED         0       0       1       0      1
CHANGING      552     443     565     402   1962
CHEVRON       333     215     225     179    952
CIGAR         519     421     783     334   2057
CIRCLE       2002    1464    2604    1537   7607
CONE           85      70      84      77    316
CRESCENT        1       1       0       0      2
CROSS          63      53      70      47    233
CYLINDER      332     272     449     230   1283
DELTA           3       1       2       1      7
DIAMOND       309     244     349     276   1178
DISK         1320    1052    1973     868   5213
DOME            0       1       0       0      1
EGG           176     159     260     164    759
FIREBALL     1789    1038    2025    1356   6208
FLARE           1       0       0       0      1
FLASH         387     253     420 

'\nCircle was the most common shape that appeared throughout all the seasons\nThere are a lot of unknown shapes\n\n'

In [40]:
# Run this cell without changes before moving on to step 6!

ufo['Date_time'] = pd.to_datetime(ufo['Date_time'], format="%Y-%m-%d %H:%M:%S")

### Step 6:  Filter the data where the region is equal to `New York`

In [45]:
ufo_in_ny = ufo.loc[ufo['Region'] == 'New York']
ufo_in_ny


Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
12,12,1970-10-10 16:00:00,5/11/2000,1970,10,16,Autumn,USA,United States,New York,Nassau County,40.668611,-73.527500,DISK,1800.0,30 min.,silver disc seen by family and neighbors
27,27,1978-10-10 02:00:00,2/1/2007,1978,10,2,Autumn,USA,United States,New York,Alden Manor,40.700833,-73.713333,RECTANGLE,300.0,5min,A memory I will never forget that happened men...
28,28,1979-10-10 00:00:00,4/16/2005,1979,10,0,Autumn,USA,United States,New York,Poughkeepsie,41.700278,-73.921389,CHEVRON,900.0,15 minutes,1/4 moon-like&#44 its &#39chord&#39 or flat s...
38,38,1984-10-10 22:00:00,8/10/1999,1984,10,22,Autumn,USA,United States,New York,White Plains,41.033889,-73.763333,FORMATION,20.0,15-20 seconds,Saw a hugh object in sky with lights intermitt...
40,40,1986-10-10 20:00:00,10/8/2007,1986,10,20,Autumn,USA,United States,New York,Holmes,41.523427,-73.646795,CHEVRON,180.0,3 minutes,Football Field Sized Chevron with bright white...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80276,80276,2011-09-09 20:30:00,5/13/2012,2011,9,20,Autumn,USA,United States,New York,Rochester,43.154722,-77.615833,CIRCLE,60.0,1min,Amber/orange balls 1 following the other chang...
80279,80279,2011-09-09 21:15:00,10/10/2011,2011,9,21,Autumn,USA,United States,New York,Montauk,41.035833,-71.955000,CYLINDER,2100.0,approx. 35 mins,Yellow to orange colors radiating above and be...
80299,80299,2012-09-09 20:10:00,9/24/2012,2012,9,20,Autumn,USA,United States,New York,Alden Manor,40.700833,-73.713333,CIRCLE,600.0,10 minutes,Orange lights seen in Elmont&#44 Long Island&#...
80304,80304,2012-09-09 21:00:00,9/24/2012,2012,9,21,Autumn,USA,United States,New York,New York,40.714167,-74.006389,LIGHT,1290.0,21:30,Glowing&#44 circular lights visible in the clo...


### Step 7:  Get the most recent `Date_time` that a UFO was sighted in New York 

Hint:  Make sure you saved your filtered data from Step 6 to a new dataframe object aka varaible.  You can use `.max()` right after a column name to get the max of that column

You are using the `Date_time` column for this question

In [48]:
ufo_time = ufo_in_ny['Date_time'].max()
print(ufo_time)

2014-05-05 23:04:00


## Above and Beyond (AAB)  -- OPTIONAL

### Question 1:  How many days have passed between the first UFO sighting in NY and the most recent sighting in NY based on this data?

In [53]:
ufo_days = ufo_in_ny['Date_time'].max() - ufo_in_ny['Date_time'].min() 
print(ufo_days)
print(ufo_days/365)

30654 days 01:04:00
83 days 23:36:30.246575342


### Question 2:  Filter the data where UFO_shape is `UNKNOWN` and the Region is `New York` 

In [64]:
ufo_unknown_ny = ufo.loc[(ufo['Region'] == 'New York') & (ufo['UFO_shape'] == 'UNKNOWN')]
ufo_unknown_ny

Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
661,661,1999-10-01 17:00:00,2/18/2001,1999,10,17,Autumn,USA,United States,New York,New York,40.714167,-74.006389,UNKNOWN,5.0,5 seconds,I witnessed a being in the middle of the day i...
816,816,2006-10-01 21:00:00,10/30/2006,2006,10,21,Autumn,USA,United States,New York,Village of Orchard Park,42.767500,-78.744167,UNKNOWN,37800.0,approx. 1 1/2 hours,Spotted again as before........
923,923,2011-10-01 00:00:00,12/12/2011,2011,10,0,Autumn,USA,United States,New York,New York,40.579532,-74.150201,UNKNOWN,600.0,10 minutes,Huge bright fireball descends over Staten Island.
1059,1059,2003-10-12 02:00:00,11/26/2003,2003,10,2,Autumn,USA,United States,New York,Lark Street,42.652500,-73.756667,UNKNOWN,30.0,30 seconds,object emmited bright light then sped off in a...
1495,1495,1997-10-14 16:00:00,8/5/2001,1997,10,16,Autumn,USA,United States,New York,Syracuse,43.048056,-76.147778,UNKNOWN,30.0,30 sec. max,4 Military planes fly past flying rod&#44 and...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
78528,78528,2012-09-29 20:30:00,10/30/2012,2012,9,20,Autumn,USA,United States,New York,Buffalo,42.886389,-78.878611,UNKNOWN,300.0,3-5 minutes,6 orange reddish&#44 flashing lights&#44 slow...
79188,79188,2010-09-04 20:51:00,11/21/2010,2010,9,20,Autumn,USA,United States,New York,Middletown,41.445833,-74.423333,UNKNOWN,3.0,3 sec.,Object dips up and down and zig zags.
79195,79195,2010-09-04 22:00:00,11/21/2010,2010,9,22,Autumn,USA,United States,New York,Lindenhurst,40.686667,-73.373889,UNKNOWN,90.0,1 min 30 sec,It was a very bright light&#44 extremly high u...
80219,80219,2008-09-09 03:00:00,10/31/2008,2008,9,3,Autumn,USA,United States,New York,Lake Grove,40.852778,-73.115556,UNKNOWN,12.0,10 - 12 seconds,Light in the sky which moved in a way that is ...
