# About Dataset
## Context
Eclipses of the sun can only occur when the moon is near one of its two orbital nodes during the new moon phase. It is then possible for the Moon's penumbral, umbral, or antumbral shadows to sweep across Earth's surface thereby producing an eclipse. There are four types of solar eclipses: a partial eclipse, during which the moon's penumbral shadow traverses Earth and umbral and antumbral shadows completely miss Earth; an annular eclipse, during which the moon's antumbral shadow traverses Earth but does not completely cover the sun; a total eclipse, during which the moon's umbral shadow traverses Earth and completely covers the sun; and a hybrid eclipse, during which the moon's umbral and antumbral shadows traverse Earth and annular and total eclipses are visible in different locations. Earth will experience 11898 solar eclipses during the five millennium period -1999 to +3000 (2000 BCE to 3000 CE).

Eclipses of the moon can occur when the moon is near one of its two orbital nodes during the full moon phase. It is then possible for the moon to pass through Earth's penumbral or umbral shadows thereby producing an eclipse. There are three types of lunar eclipses: a penumbral eclipse, during which the moon traverses Earth's penumbral shadow but misses its umbral shadow; a partial eclipse, during which the moon traverses Earth's penumbral and umbral shadows; and a total eclipse, during which the moon traverses Earth's penumbral and umbral shadows and passes completely into Earth's umbra. Earth will experience 12064 lunar eclipses during the five millennium period -1999 to +3000 (2000 BCE to 3000 CE).

## Acknowledgements
Lunar eclipse predictions were produced by Fred Espenak from NASA's Goddard Space Flight Center.

## import Libraries

In [163]:
import pandas as pd
import numpy as np

## Read Data

In [164]:
df=pd.read_csv('solar.csv')
df

Unnamed: 0,Catalog Number,Calendar Date,Eclipse Time,Delta T (s),Lunation Number,Saros Number,Eclipse Type,Gamma,Eclipse Magnitude,Latitude,Longitude,Sun Altitude,Sun Azimuth,Path Width (km),Central Duration
0,1,-1999 June 12,03:14:51,46438,-49456,5,T,-0.2701,1.0733,6.0N,33.3W,74,344,247,06m37s
1,2,-1999 December 5,23:45:23,46426,-49450,10,A,-0.2317,0.9382,32.9S,10.8E,76,21,236,06m44s
2,3,-1998 June 1,18:09:16,46415,-49444,15,T,0.4994,1.0284,46.2N,83.4E,60,151,111,02m15s
3,4,-1998 November 25,05:57:03,46403,-49438,20,A,-0.9045,0.9806,67.8S,143.8W,25,74,162,01m14s
4,5,-1997 April 22,13:19:56,46393,-49433,-13,P,-1.4670,0.1611,60.6S,106.4W,0,281,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11893,11894,2998 December 10,03:18:31,4414,12355,187,P,1.2838,0.4773,67.2N,145.0E,0,179,,
11894,11895,2999 May 6,23:23:57,4417,12360,154,T,0.8388,1.0566,71.5N,177.3E,33,146,345,03m25s
11895,11896,2999 October 30,09:34:33,4420,12366,159,A-,-1.0023,0.9586,70.9S,84.7W,0,137,-,-
11896,11897,3000 April 26,14:18:06,4424,12372,164,T,0.1310,1.0222,21.1N,18.4W,82,166,76,02m11s


In [165]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11898 entries, 0 to 11897
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Catalog Number     11898 non-null  int64  
 1   Calendar Date      11898 non-null  object 
 2   Eclipse Time       11898 non-null  object 
 3   Delta T (s)        11898 non-null  int64  
 4   Lunation Number    11898 non-null  int64  
 5   Saros Number       11898 non-null  int64  
 6   Eclipse Type       11898 non-null  object 
 7   Gamma              11898 non-null  float64
 8   Eclipse Magnitude  11898 non-null  float64
 9   Latitude           11898 non-null  object 
 10  Longitude          11898 non-null  object 
 11  Sun Altitude       11898 non-null  int64  
 12  Sun Azimuth        11898 non-null  int64  
 13  Path Width (km)    7698 non-null   object 
 14  Central Duration   7698 non-null   object 
dtypes: float64(2), int64(6), object(7)
memory usage: 1.4+ MB


## Change Eclipse Type 

In [166]:
df['Eclipse Type'].value_counts()

Eclipse Type
P     3875
A     3755
T     3049
H      502
Pb     163
Pe     162
Am      72
Tm      72
An      36
A+      34
A-      34
H3      26
As      25
H2      24
Hm      17
T-      17
Tn      14
Ts      12
T+       9
Name: count, dtype: int64

In [167]:
df['Eclipse Type'] = df['Eclipse Type'].replace({
    'P': 'Partial eclipse',
    'A': 'Annular eclipse',
    'T': 'Total eclipse',
    'H': 'Hybrid eclipse',
    'Pb': 'Partial eclipse',
    'Pe': 'Partial eclipse',
    'Am': 'Annular eclipse',
    'Tm': 'Total eclipse',
    'An': 'Annular eclipse',
    'A+': 'Annular eclipse',
    'A-': 'Annular eclipse',
    'H3': 'Hybrid eclipse',
    'As': 'Annular eclipse',
    'H2': 'Hybrid eclipse',
    'Hm': 'Hybrid eclipse',
    'T-': 'Total eclipse',
    'Tn': 'Total eclipse',
    'Ts': 'Total eclipse',
    'T+': 'Total eclipse'
})


In [168]:
df['Eclipse Type'].value_counts()

Eclipse Type
Partial eclipse    4200
Annular eclipse    3956
Total eclipse      3173
Hybrid eclipse      569
Name: count, dtype: int64

## Change the columns names into formal type

In [169]:
df.columns

Index(['Catalog Number', 'Calendar Date', 'Eclipse Time', 'Delta T (s)',
       'Lunation Number', 'Saros Number', 'Eclipse Type', 'Gamma',
       'Eclipse Magnitude', 'Latitude', 'Longitude', 'Sun Altitude',
       'Sun Azimuth', 'Path Width (km)', 'Central Duration'],
      dtype='object')

In [170]:
df = df.rename(columns={
    'Catalog Number': 'Catalog-Number',
    'Calendar Date': 'Calendar-Date',
    'Eclipse Time': 'Eclipse-Time',
    'Delta T (s)': 'Delta-T(s)',
    'Lunation Number': 'Lunation-Number',
    'Saros Number': 'Saros-Number',
    'Eclipse Type': 'Eclipse-Type',
    'Gamma': 'Gamma',
    'Eclipse Magnitude': 'Eclipse-Magnitude',
    'Latitude': 'Latitude',
    'Longitude': 'Longitude',
    'Sun Altitude': 'Sun-Altitude',
    'Sun Azimuth': 'Sun-Azimuth',
    'Path Width (km)': 'Path-Width(km)',
    'Central Duration': 'Central-Duration'
})


In [171]:
df.columns

Index(['Catalog-Number', 'Calendar-Date', 'Eclipse-Time', 'Delta-T(s)',
       'Lunation-Number', 'Saros-Number', 'Eclipse-Type', 'Gamma',
       'Eclipse-Magnitude', 'Latitude', 'Longitude', 'Sun-Altitude',
       'Sun-Azimuth', 'Path-Width(km)', 'Central-Duration'],
      dtype='object')

In [172]:
df.isnull().sum()

Catalog-Number          0
Calendar-Date           0
Eclipse-Time            0
Delta-T(s)              0
Lunation-Number         0
Saros-Number            0
Eclipse-Type            0
Gamma                   0
Eclipse-Magnitude       0
Latitude                0
Longitude               0
Sun-Altitude            0
Sun-Azimuth             0
Path-Width(km)       4200
Central-Duration     4200
dtype: int64

### After searching, We found that Path width only applies to central eclipses (total, annular, or hybrid) not for Partial, so we will change it into Not Applicaple

In [173]:
df[['Path-Width(km)', 'Central-Duration']].isnull().sum()


Path-Width(km)      4200
Central-Duration    4200
dtype: int64

In [174]:
df[['Path-Width(km)', 'Central-Duration']] = df[['Path-Width(km)', 'Central-Duration']].fillna("Not Applicable")

In [175]:
df.isnull().sum()

Catalog-Number       0
Calendar-Date        0
Eclipse-Time         0
Delta-T(s)           0
Lunation-Number      0
Saros-Number         0
Eclipse-Type         0
Gamma                0
Eclipse-Magnitude    0
Latitude             0
Longitude            0
Sun-Altitude         0
Sun-Azimuth          0
Path-Width(km)       0
Central-Duration     0
dtype: int64

In [176]:
df

Unnamed: 0,Catalog-Number,Calendar-Date,Eclipse-Time,Delta-T(s),Lunation-Number,Saros-Number,Eclipse-Type,Gamma,Eclipse-Magnitude,Latitude,Longitude,Sun-Altitude,Sun-Azimuth,Path-Width(km),Central-Duration
0,1,-1999 June 12,03:14:51,46438,-49456,5,Total eclipse,-0.2701,1.0733,6.0N,33.3W,74,344,247,06m37s
1,2,-1999 December 5,23:45:23,46426,-49450,10,Annular eclipse,-0.2317,0.9382,32.9S,10.8E,76,21,236,06m44s
2,3,-1998 June 1,18:09:16,46415,-49444,15,Total eclipse,0.4994,1.0284,46.2N,83.4E,60,151,111,02m15s
3,4,-1998 November 25,05:57:03,46403,-49438,20,Annular eclipse,-0.9045,0.9806,67.8S,143.8W,25,74,162,01m14s
4,5,-1997 April 22,13:19:56,46393,-49433,-13,Partial eclipse,-1.4670,0.1611,60.6S,106.4W,0,281,Not Applicable,Not Applicable
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11893,11894,2998 December 10,03:18:31,4414,12355,187,Partial eclipse,1.2838,0.4773,67.2N,145.0E,0,179,Not Applicable,Not Applicable
11894,11895,2999 May 6,23:23:57,4417,12360,154,Total eclipse,0.8388,1.0566,71.5N,177.3E,33,146,345,03m25s
11895,11896,2999 October 30,09:34:33,4420,12366,159,Annular eclipse,-1.0023,0.9586,70.9S,84.7W,0,137,-,-
11896,11897,3000 April 26,14:18:06,4424,12372,164,Total eclipse,0.1310,1.0222,21.1N,18.4W,82,166,76,02m11s


## Replace the dashes into not Applicable

In [177]:
(df[['Path-Width(km)', 'Central-Duration']]=='-').sum()

Path-Width(km)      181
Central-Duration     94
dtype: int64

In [178]:
df[['Path-Width(km)', 'Central-Duration']] = df[['Path-Width(km)', 'Central-Duration']].replace('-', 'Not Applicable')

In [179]:
df

Unnamed: 0,Catalog-Number,Calendar-Date,Eclipse-Time,Delta-T(s),Lunation-Number,Saros-Number,Eclipse-Type,Gamma,Eclipse-Magnitude,Latitude,Longitude,Sun-Altitude,Sun-Azimuth,Path-Width(km),Central-Duration
0,1,-1999 June 12,03:14:51,46438,-49456,5,Total eclipse,-0.2701,1.0733,6.0N,33.3W,74,344,247,06m37s
1,2,-1999 December 5,23:45:23,46426,-49450,10,Annular eclipse,-0.2317,0.9382,32.9S,10.8E,76,21,236,06m44s
2,3,-1998 June 1,18:09:16,46415,-49444,15,Total eclipse,0.4994,1.0284,46.2N,83.4E,60,151,111,02m15s
3,4,-1998 November 25,05:57:03,46403,-49438,20,Annular eclipse,-0.9045,0.9806,67.8S,143.8W,25,74,162,01m14s
4,5,-1997 April 22,13:19:56,46393,-49433,-13,Partial eclipse,-1.4670,0.1611,60.6S,106.4W,0,281,Not Applicable,Not Applicable
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11893,11894,2998 December 10,03:18:31,4414,12355,187,Partial eclipse,1.2838,0.4773,67.2N,145.0E,0,179,Not Applicable,Not Applicable
11894,11895,2999 May 6,23:23:57,4417,12360,154,Total eclipse,0.8388,1.0566,71.5N,177.3E,33,146,345,03m25s
11895,11896,2999 October 30,09:34:33,4420,12366,159,Annular eclipse,-1.0023,0.9586,70.9S,84.7W,0,137,Not Applicable,Not Applicable
11896,11897,3000 April 26,14:18:06,4424,12372,164,Total eclipse,0.1310,1.0222,21.1N,18.4W,82,166,76,02m11s


## Cange the Central-Duration values into formal type

In [180]:
mask = df['Central-Duration'] != 'Not Applicable'

def convert_duration(duration_str):
    minutes = int(duration_str.split('m')[0])
    seconds = int(duration_str.split('m')[1].replace('s', ''))
    return f"{0:02d}:{minutes:02d}:{seconds:02d}"

df.loc[mask, 'Central-Duration'] = df.loc[mask, 'Central-Duration'].apply(convert_duration)

In [181]:
df

Unnamed: 0,Catalog-Number,Calendar-Date,Eclipse-Time,Delta-T(s),Lunation-Number,Saros-Number,Eclipse-Type,Gamma,Eclipse-Magnitude,Latitude,Longitude,Sun-Altitude,Sun-Azimuth,Path-Width(km),Central-Duration
0,1,-1999 June 12,03:14:51,46438,-49456,5,Total eclipse,-0.2701,1.0733,6.0N,33.3W,74,344,247,00:06:37
1,2,-1999 December 5,23:45:23,46426,-49450,10,Annular eclipse,-0.2317,0.9382,32.9S,10.8E,76,21,236,00:06:44
2,3,-1998 June 1,18:09:16,46415,-49444,15,Total eclipse,0.4994,1.0284,46.2N,83.4E,60,151,111,00:02:15
3,4,-1998 November 25,05:57:03,46403,-49438,20,Annular eclipse,-0.9045,0.9806,67.8S,143.8W,25,74,162,00:01:14
4,5,-1997 April 22,13:19:56,46393,-49433,-13,Partial eclipse,-1.4670,0.1611,60.6S,106.4W,0,281,Not Applicable,Not Applicable
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11893,11894,2998 December 10,03:18:31,4414,12355,187,Partial eclipse,1.2838,0.4773,67.2N,145.0E,0,179,Not Applicable,Not Applicable
11894,11895,2999 May 6,23:23:57,4417,12360,154,Total eclipse,0.8388,1.0566,71.5N,177.3E,33,146,345,00:03:25
11895,11896,2999 October 30,09:34:33,4420,12366,159,Annular eclipse,-1.0023,0.9586,70.9S,84.7W,0,137,Not Applicable,Not Applicable
11896,11897,3000 April 26,14:18:06,4424,12372,164,Total eclipse,0.1310,1.0222,21.1N,18.4W,82,166,76,00:02:11
