# M1L7 Data Challenge:  Data Manipulation 

 We'll continue to work with UFO sighting data.

### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder 

### **Objectives:**

- Use string methods to manipulate data 
- Filter Data 
- Work more with dates in Python



**Let's get started!**

### Step 1:  Import Pandas & Numpy

In [20]:
# Import Pandas 
import pandas as pd
import numpy as np
import datetime as dt

### Step 2: Load the dataset (csv file stored in the data folder) into a Pandas DataFrame called `ufo`

- The file is callled `ufo-sightings.csv`


In [21]:
ufo = pd.read_csv("/Users/gabriel/Desktop/marcy/DA2025_Lectures2/Mod1/lecture_code_alongs/ufo-sightings-transformed.csv")


### Step 3: Explore the Data

Use any method(s) of your choice to look at the data and explore it 


In [22]:
ufo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80328 entries, 0 to 80327
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Unnamed: 0                   80328 non-null  int64  
 1   Date_time                    80328 non-null  object 
 2   date_documented              80328 non-null  object 
 3   Year                         80328 non-null  int64  
 4   Month                        80328 non-null  int64  
 5   Hour                         80328 non-null  int64  
 6   Season                       80328 non-null  object 
 7   Country_Code                 80069 non-null  object 
 8   Country                      80069 non-null  object 
 9   Region                       79762 non-null  object 
 10  Locale                       79871 non-null  object 
 11  latitude                     80328 non-null  float64
 12  longitude                    80328 non-null  float64
 13  UFO_shape       

### Step 4:  Clean the UFO_shape column 
- Make the column all uppercase 
- Strip off any leading and trailing spaces 

Even if there are no actual spaces; it is still good practice to trim off spaces even if you can't see space with the naked eye

Hint:  You will use both `str.upper()` and `str.strip()` -- you can do it in one step or two separate steps 

In [23]:
ufo['UFO_shape'] = ufo['UFO_shape'].str.upper().str.strip()
print(ufo['UFO_shape'])

0        CYLINDER
1           LIGHT
2          CIRCLE
3          CIRCLE
4           LIGHT
           ...   
80323       LIGHT
80324      CIRCLE
80325       OTHER
80326      CIRCLE
80327       CIGAR
Name: UFO_shape, Length: 80328, dtype: object


### Step 5:  Use `pd.crosstab` to sum the number of shapes seen by season

- Add a comment of a main takeaway from the output 

In [24]:
add = pd.crosstab(ufo['UFO_shape'], ufo['Season']).sum()
print(add)
#Add comment here:  To show which is differences between each season for ufos

Season
Autumn    21261
Spring    15799
Summer    25768
Winter    15570
dtype: int64


In [25]:
# Run this cell without changes before moving on to step 6!

ufo['Date_time'] = pd.to_datetime(ufo['Date_time'], format="%Y-%m-%d %H:%M:%S")

### Step 6:  Filter the data where the region is equal to `New York`

In [26]:
ufo_ny = ufo[ufo['Region'] == 'New York']
ufo_ny.head()


Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
12,12,1970-10-10 16:00:00,5/11/2000,1970,10,16,Autumn,USA,United States,New York,Nassau County,40.668611,-73.5275,DISK,1800.0,30 min.,silver disc seen by family and neighbors
27,27,1978-10-10 02:00:00,2/1/2007,1978,10,2,Autumn,USA,United States,New York,Alden Manor,40.700833,-73.713333,RECTANGLE,300.0,5min,A memory I will never forget that happened men...
28,28,1979-10-10 00:00:00,4/16/2005,1979,10,0,Autumn,USA,United States,New York,Poughkeepsie,41.700278,-73.921389,CHEVRON,900.0,15 minutes,1/4 moon-like&#44 its &#39chord&#39 or flat s...
38,38,1984-10-10 22:00:00,8/10/1999,1984,10,22,Autumn,USA,United States,New York,White Plains,41.033889,-73.763333,FORMATION,20.0,15-20 seconds,Saw a hugh object in sky with lights intermitt...
40,40,1986-10-10 20:00:00,10/8/2007,1986,10,20,Autumn,USA,United States,New York,Holmes,41.523427,-73.646795,CHEVRON,180.0,3 minutes,Football Field Sized Chevron with bright white...


### Step 7:  Get the most recent `Date_time` that a UFO was sighted in New York 

Hint:  Make sure you saved your filtered data from Step 6 to a new dataframe object aka varaible.  You can use `.max()` right after a column name to get the max of that column

You are using the `Date_time` column for this question

In [27]:
pd.to_datetime(ufo_ny['Date_time']).max()

Timestamp('2014-05-05 23:04:00')

## Above and Beyond (AAB)  -- OPTIONAL

### Question 1:  How many days have passed between the first UFO sighting in NY and the most recent sighting in NY based on this data?

In [51]:
ufo['Date_time'] = pd.to_datetime(ufo['Date_time'])
ny_ufo = ufo[ufo['Region'] == 'New York']
first_date = ny_ufo['Date_time'].min()
last_date = ny_ufo['Date_time'].max()
days_passed = (last_date - first_date).days
print(days_passed)


30654


### Question 2:  Filter the data where UFO_shape is `UNKNOWN` and the Region is `New York` 

In [46]:
ufo[(ufo['UFO_shape'] == "Unknown") & (ufo['Region'] == 'New York')]
ufo

Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
0,0,1949-10-10 20:30:00,4/27/2004,1949,10,20,Autumn,USA,United States,Texas,San Marcos,29.883056,-97.941111,CYLINDER,2700.0,45 minutes,This event took place in early fall around 194...
1,1,1949-10-10 21:00:00,12/16/2005,1949,10,21,Autumn,USA,United States,Texas,Bexar County,29.384210,-98.581082,LIGHT,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...
2,2,1955-10-10 17:00:00,1/21/2008,1955,10,17,Autumn,GBR,United Kingdom,England,Chester,53.200000,-2.916667,CIRCLE,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...
3,3,1956-10-10 21:00:00,1/17/2004,1956,10,21,Autumn,USA,United States,Texas,Edna,28.978333,-96.645833,CIRCLE,20.0,1/2 hour,My older brother and twin sister were leaving ...
4,4,1960-10-10 20:00:00,1/22/2004,1960,10,20,Autumn,USA,United States,Hawaii,Kaneohe,21.418056,-157.803611,LIGHT,900.0,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80323,80323,2013-09-09 21:15:00,9/30/2013,2013,9,21,Autumn,USA,United States,Tennessee,Nashville,36.165833,-86.784444,LIGHT,600.0,10 minutes,Round from the distance/slowly changing colors...
80324,80324,2013-09-09 22:00:00,9/30/2013,2013,9,22,Autumn,USA,United States,Idaho,Boise,43.613611,-116.202500,CIRCLE,1200.0,20 minutes,Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r...
80325,80325,2013-09-09 22:00:00,9/30/2013,2013,9,22,Autumn,USA,United States,California,Napa Abajo,38.297222,-122.284444,OTHER,1200.0,hour,Napa UFO&#44
80326,80326,2013-09-09 22:20:00,9/30/2013,2013,9,22,Autumn,USA,United States,Virginia,Vienna,38.901111,-77.265556,CIRCLE,5.0,5 seconds,Saw a five gold lit cicular craft moving fastl...
