## EDA  with Python

#### Overview
| Column Name          | Type   | Description                                                                                     |
|-----------------------|--------|-------------------------------------------------------------------------------------------------|
| number_of_strikes     | int64  | The total count of lightning strikes in that geographic tile on a given date                    |
| center_point_geom     | str    | String of characters representing the geographic center point of the strikes based on the latitude and longitude given |
| date    | str    | The recorded date (format: YYYY/MM/DD) |

In this notebook, we will use pandas to examine 2018 lightning strike data collected by the National Oceanic and Atmospheric Administration (NOAA). Then, we will calculate the total number of strikes for each month and plot this information on a bar graph.



### Import packages and libraries

Before getting started, we will need to import all the required libraries and extensions.

In [1]:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt

In [2]:
# Read in the 2018 lightning strike dataset.
df = pd.read_csv('eda_using_basic_data_functions_in_python_dataset1.csv')

In [4]:
df.head()

Unnamed: 0,date,number_of_strikes,center_point_geom
0,2018-01-03,194,POINT(-75 27)
1,2018-01-03,41,POINT(-78.4 29)
2,2018-01-03,33,POINT(-73.9 27)
3,2018-01-03,38,POINT(-73.8 27)
4,2018-01-03,92,POINT(-79 28)


In [6]:
df['date'] = pd.to_datetime(df['date'])

In [7]:
# Extracting the month
df['month'] = df['date'].dt.month

# If we want the month name instead of the month number, you can use dt.strftime()
df['month_name'] = df['date'].dt.strftime('%B')

In [10]:
df['month_name1'] = df['date'].dt.strftime('%b')

In [11]:
df

Unnamed: 0,date,number_of_strikes,center_point_geom,month,month_name,month_name1
0,2018-01-03,194,POINT(-75 27),1,January,Jan
1,2018-01-03,41,POINT(-78.4 29),1,January,Jan
2,2018-01-03,33,POINT(-73.9 27),1,January,Jan
3,2018-01-03,38,POINT(-73.8 27),1,January,Jan
4,2018-01-03,92,POINT(-79 28),1,January,Jan
...,...,...,...,...,...,...
3401007,2018-12-28,30,POINT(-90.6 28.7),12,December,Dec
3401008,2018-12-28,30,POINT(-89.4 30.9),12,December,Dec
3401009,2018-12-28,30,POINT(-89.5 31.4),12,December,Dec
3401010,2018-12-28,30,POINT(-88.3 31.6),12,December,Dec


Notice that the data is structured as one row per day along with the geometric location of the strike. 

![image.png](attachment:image.png)