<a href="https://colab.research.google.com/github/DavoodSZ1993/Python_Tutorial/blob/main/pandas_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Importing Useful Modules

In [2]:
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip --quiet

In [4]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython import display

## Downloading the UFO Dataset from Kaggle

In [6]:
# opendatasets is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.
!pip install opendatasets --upgrade --quiet

In [7]:
import opendatasets as od

dataset_url = 'https://www.kaggle.com/datasets/NUFORC/ufo-sightings'
od.download(dataset_url)

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: davoodsoleymanzadeh
Your Kaggle Key: ··········
Downloading ufo-sightings.zip to ./ufo-sightings


100%|██████████| 10.2M/10.2M [00:00<00:00, 81.7MB/s]







## Exploring UFO Dataset

In [8]:
from pandas_profiling import ProfileReport

df = pd.read_csv('ufo-sightings/scrubbed.csv')
df.head(5)                                     # Shows the first 5 rows of the dataset.

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,45 minutes,This event took place in early fall around 194...,4/27/2004,29.8830556,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.9783333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.4180556,-157.803611


In [9]:
# profile report

profile = ProfileReport(df.copy(),title='UFO Report', html={'style': {'full_width': True}})
# profile.to_notebook_iframe()

# to save/output the file
profile.to_file(output_file='ufo_report.html')

display.clear_output()


### More useful functions:

* `df.describe()`

*  `df.info()`

In [10]:
df.describe(include='all')

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
count,80332,80332,74535,70662,78400,80332.0,80332,80317,80332,80332.0,80332.0
unique,69586,19900,67,5,29,706.0,8349,79997,317,23312.0,
top,7/4/2010 22:00,seattle,ca,us,light,300.0,5 minutes,Fireball,12/12/2009,47.6063889,
freq,36,525,9655,65114,16565,7070.0,4716,11,1510,481.0,
mean,,,,,,,,,,,-86.772885
std,,,,,,,,,,,39.697205
min,,,,,,,,,,,-176.658056
25%,,,,,,,,,,,-112.073333
50%,,,,,,,,,,,-87.903611
75%,,,,,,,,,,,-78.755


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80332 entries, 0 to 80331
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   datetime              80332 non-null  object 
 1   city                  80332 non-null  object 
 2   state                 74535 non-null  object 
 3   country               70662 non-null  object 
 4   shape                 78400 non-null  object 
 5   duration (seconds)    80332 non-null  object 
 6   duration (hours/min)  80332 non-null  object 
 7   comments              80317 non-null  object 
 8   date posted           80332 non-null  object 
 9   latitude              80332 non-null  object 
 10  longitude             80332 non-null  float64
dtypes: float64(1), object(10)
memory usage: 6.7+ MB


## Exploring and Analysing the Dataset:

* `df.drop()`: drops specified labels from rows or columns.

In [12]:
df1 = df.copy()

df1.drop(['duration (hours/min)'], axis=1)

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,This event took place in early fall around 194...,4/27/2004,29.8830556,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20,My older brother and twin sister were leaving ...,1/17/2004,28.9783333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.4180556,-157.803611
...,...,...,...,...,...,...,...,...,...,...
80327,9/9/2013 21:15,nashville,tn,us,light,600.0,Round from the distance/slowly changing colors...,9/30/2013,36.165833,-86.784444
80328,9/9/2013 22:00,boise,id,us,circle,1200.0,Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r...,9/30/2013,43.613611,-116.202500
80329,9/9/2013 22:00,napa,ca,us,other,1200.0,Napa UFO&#44,9/30/2013,38.297222,-122.284444
80330,9/9/2013 22:20,vienna,va,us,circle,5.0,Saw a five gold lit cicular craft moving fastl...,9/30/2013,38.901111,-77.265556


* `df.columns.value`: the value of the columns of the dataframe.

In [13]:
cols = list(df.columns.values)
cols

['datetime',
 'city',
 'state',
 'country',
 'shape',
 'duration (seconds)',
 'duration (hours/min)',
 'comments',
 'date posted',
 'latitude',
 'longitude ']

* `pd.to_numeric()`: Convert argument to a numeric type

In [14]:
df['latitude'] = pd.to_numeric(df['latitude'], errors='coerce')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80332 entries, 0 to 80331
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   datetime              80332 non-null  object 
 1   city                  80332 non-null  object 
 2   state                 74535 non-null  object 
 3   country               70662 non-null  object 
 4   shape                 78400 non-null  object 
 5   duration (seconds)    80332 non-null  object 
 6   duration (hours/min)  80332 non-null  object 
 7   comments              80317 non-null  object 
 8   date posted           80332 non-null  object 
 9   latitude              80331 non-null  float64
 10  longitude             80332 non-null  float64
dtypes: float64(2), object(9)
memory usage: 6.7+ MB


In [15]:
df['duration (seconds)'] = pd.to_numeric(df['duration (seconds)'], errors='coerce')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80332 entries, 0 to 80331
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   datetime              80332 non-null  object 
 1   city                  80332 non-null  object 
 2   state                 74535 non-null  object 
 3   country               70662 non-null  object 
 4   shape                 78400 non-null  object 
 5   duration (seconds)    80329 non-null  float64
 6   duration (hours/min)  80332 non-null  object 
 7   comments              80317 non-null  object 
 8   date posted           80332 non-null  object 
 9   latitude              80331 non-null  float64
 10  longitude             80332 non-null  float64
dtypes: float64(3), object(8)
memory usage: 6.7+ MB


* `df.sort_values()`: Sort the values along either exis.

In [16]:
df = df.sort_values('duration (seconds)')
df.head()

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
4081,10/23/2008 04:45,remote,wy,,flash,0.001,0.001sec,brilliant strobe light at 4am&#44 moving light...,1/10/2009,-46.163992,169.87505
70587,8/30/2002 13:45,kerry (republic of ireland),,,sphere,0.01,0.01secs,The object seemed to move at lightining speed,9/13/2002,52.154461,-9.566863
42378,5/15/1987 23:00,island lake,il,us,light,0.01,milliseconds,4 red laser like lines,1/12/2012,42.276111,-88.191944
70393,8/29/2002 23:45,toledo,or,us,triangle,0.01,millisecond,The object I saw was very clear and moved at ...,9/6/2002,44.621667,-123.937222
52996,6/30/2002 03:15,helsinki (finland),,,unknown,0.01,0.01sec,Overpassing UFO,7/1/2002,60.173324,24.941025


In [17]:
df = df.sort_values(['duration (seconds)', 'latitude'], ascending=False)
df.head()

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
559,10/1/1983 17:00,birmingham (uk/england),,gb,sphere,97836000.0,31 years,Firstly&#44 I was stunned and stared at the ob...,4/12/2013,52.466667,-1.916667
53384,6/3/2010 23:30,ottawa (canada),on,ca,other,82800000.0,23000hrs,((HOAX??)) I was out in a field near mil&#44 ...,7/6/2010,45.416667,-75.7
74660,9/15/1991 18:00,greenbrier,ar,us,light,66276000.0,21 years,Orange or amber balls or orbs of light multipl...,3/31/2008,35.233889,-92.3875
64390,8/10/2012 21:00,finley,wa,us,light,52623200.0,2 months,There have been several flying objects in a pe...,8/19/2012,46.154167,-119.032778
38261,4/2/1983 24:00,dont know,,,,52623200.0,2 months,Hi&#44 I&#8217;m writing to you because I wan...,7/25/2004,41.730561,-78.682099


## Further Reading:

* [User Guide for Pandas](https://jovian.ai/outlink?url=https%3A%2F%2Fpandas.pydata.org%2Fdocs%2Fuser_guide%2Findex.html)