#### The Data

The dataset contains the following variables:

1. `name`: the name of the place where a meteorite was found or observed.

2. `id`: a unique identifier for a meteorite.

3. `nametype`: one of the following:
    
    - `valid`: a typical meteorite.
    
    - `relict`: a meteorite that has been highly degraded by the weather on Earth.

4. `recclass`: the class of the meteorite; one of a large number of classes based on physical, chemical, and other characteristics.

5. `mass:` the mass of the meteorite, in grams

6. `fall`: whether the meteorite was seen falling, or was discovered after its impact; one of the following:

    - `Fell`: the meteorite's fall was observed.
    
    - `Found`: the meteorite's fall was not observed.

7. `year`: the year the meteorite fell, or the year it was found (depending on the value of fell).

8. `reclat`: the latitude of the meteorite's landing.

9. `reclong`: the longitude of the meteorite's landing.

10. `GeoLocation`: a parentheses-enclose, comma-separated tuple that combines `reclat` and `reclong` values.


---

In [None]:
# Import the 'numpy', 'pandas' and 'matplotlib.pyplot' libraries for this class and create a DataFrame.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

met_df = pd.read_csv('https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/meteorite-landings/meteorite-landings.csv')
met_df.head()

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.775,6.08333,"(50.775000, 6.083330)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.183330, 10.233330)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.0,"(54.216670, -113.000000)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.9,"(16.883330, -99.900000)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95,"(-33.166670, -64.950000)"


In [None]:
# Number of rows and columns in the DataFrame.
met_df.shape

(45716, 10)

---

In [None]:
# Descriptive statistics for the 'year' values in the 'met_df' DataFrame.
met_df['year'].describe()

count    45428.000000
mean      1991.772189
std         27.181247
min        301.000000
25%       1987.000000
50%       1998.000000
75%       2003.000000
max       2501.000000
Name: year, dtype: float64

---

In [None]:
# Rows containing the year values less than 860 and greater than 2016.
correct_years_df = met_df[(met_df['year'] >= 860) & (met_df['year'] <= 2016)]
correct_years_df

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.77500,6.08333,"(50.775000, 6.083330)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.183330, 10.233330)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.00000,"(54.216670, -113.000000)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.90000,"(16.883330, -99.900000)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95000,"(-33.166670, -64.950000)"
...,...,...,...,...,...,...,...,...,...,...
45711,Zillah 002,31356,Valid,Eucrite,172.0,Found,1990.0,29.03700,17.01850,"(29.037000, 17.018500)"
45712,Zinder,30409,Valid,"Pallasite, ungrouped",46.0,Found,1999.0,13.78333,8.96667,"(13.783330, 8.966670)"
45713,Zlin,30410,Valid,H4,3.3,Found,1939.0,49.25000,17.66667,"(49.250000, 17.666670)"
45714,Zubkovsky,31357,Valid,L6,2167.0,Found,2003.0,49.78917,41.50460,"(49.789170, 41.504600)"


---

#### Removing The Invalid `reclong` Values


In [None]:
# Rows having the 'reclong' values greater than or equal to -180 degrees and less than or equal to 180 degrees.
correct_long_df = correct_years_df[(correct_years_df['reclong'] >= -180) & (correct_years_df['reclong'] <= 180)]
correct_long_df

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.77500,6.08333,"(50.775000, 6.083330)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.183330, 10.233330)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.00000,"(54.216670, -113.000000)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.90000,"(16.883330, -99.900000)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95000,"(-33.166670, -64.950000)"
...,...,...,...,...,...,...,...,...,...,...
45711,Zillah 002,31356,Valid,Eucrite,172.0,Found,1990.0,29.03700,17.01850,"(29.037000, 17.018500)"
45712,Zinder,30409,Valid,"Pallasite, ungrouped",46.0,Found,1999.0,13.78333,8.96667,"(13.783330, 8.966670)"
45713,Zlin,30410,Valid,H4,3.3,Found,1939.0,49.25000,17.66667,"(49.250000, 17.666670)"
45714,Zubkovsky,31357,Valid,L6,2167.0,Found,2003.0,49.78917,41.50460,"(49.789170, 41.504600)"


---

#### Removing The Rows Containing `0 N, 0 E` Values


In [None]:
# Rows containing the 0 'reclat' and 0 'reclong' values from the 'correct_long_df'.
correct_lat_long_df = correct_long_df[~((correct_long_df['reclat'] == 0) & (correct_long_df['reclong'] == 0))]
correct_lat_long_df

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.77500,6.08333,"(50.775000, 6.083330)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.183330, 10.233330)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.00000,"(54.216670, -113.000000)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.90000,"(16.883330, -99.900000)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95000,"(-33.166670, -64.950000)"
...,...,...,...,...,...,...,...,...,...,...
45711,Zillah 002,31356,Valid,Eucrite,172.0,Found,1990.0,29.03700,17.01850,"(29.037000, 17.018500)"
45712,Zinder,30409,Valid,"Pallasite, ungrouped",46.0,Found,1999.0,13.78333,8.96667,"(13.783330, 8.966670)"
45713,Zlin,30410,Valid,H4,3.3,Found,1939.0,49.25000,17.66667,"(49.250000, 17.666670)"
45714,Zubkovsky,31357,Valid,L6,2167.0,Found,2003.0,49.78917,41.50460,"(49.789170, 41.504600)"


---

#### Check For The Missing Values



In [None]:
# Check whether the 'correct_lat_long_df' DataFrame has missing values or not.
correct_lat_long_df.isnull().sum()

name             0
id               0
nametype         0
recclass         0
mass           107
fall             0
year             0
reclat           0
reclong          0
GeoLocation      0
dtype: int64

In [None]:
# Rows containing the missing 'mass' values in the 'correct_lat_long_df' DataFrame.
correct_lat_long_df[correct_lat_long_df['mass'].isnull() == True]

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
12,Aire-sur-la-Lys,425,Valid,Unknown,,Fell,1769.0,50.66667,2.33333,"(50.666670, 2.333330)"
38,Angers,2301,Valid,L6,,Fell,1822.0,47.46667,-0.55000,"(47.466670, -0.550000)"
76,Barcelona (stone),4944,Valid,OC,,Fell,1704.0,41.36667,2.16667,"(41.366670, 2.166670)"
93,Belville,5009,Valid,OC,,Fell,1937.0,-32.33333,-64.86667,"(-32.333330, -64.866670)"
172,Castel Berardenga,5292,Valid,Stone-uncl,,Fell,1791.0,43.35000,11.50000,"(43.350000, 11.500000)"
...,...,...,...,...,...,...,...,...,...,...
31097,Palermo,18076,Valid,Unknown,,Found,1966.0,-34.55000,-58.43333,"(-34.550000, -58.433330)"
36812,San Luis,23129,Valid,H,,Found,1964.0,-33.33333,-66.38333,"(-33.333330, -66.383330)"
38278,Weiyuan,24233,Valid,Mesosiderite,,Found,1978.0,35.26667,104.31667,"(35.266670, 104.316670)"
41460,Yamato 792768,28117,Valid,CM2,,Found,1979.0,-71.50000,35.66667,"(-71.500000, 35.666670)"


In [None]:
# Descriptive statistics for the 'mass' column in the 'correct_lat_long_df' DataFrame.
correct_lat_long_df['mass'].describe()

count    3.192900e+04
mean     1.854289e+04
std      6.868495e+05
min      0.000000e+00
25%      6.500000e+00
50%      2.960000e+01
75%      2.020000e+02
max      6.000000e+07
Name: mass, dtype: float64

In [None]:
# List of the indices of above rows.
row_indices = correct_lat_long_df[correct_lat_long_df['mass'].isnull() == True].index
row_indices

Int64Index([   12,    38,    76,    93,   172,   204,   262,   308,   312,
              320,
            ...
            31055, 31056, 31057, 31058, 31059, 31097, 36812, 38278, 41460,
            45698],
           dtype='int64', length=107)

---

In [None]:
# Missing 'mass' values from 'correct_lat_long_df' DataFrame using the 'loc[]' function.
missing_mass_values = correct_lat_long_df.loc[row_indices, 'mass']
missing_mass_values

12      NaN
38      NaN
76      NaN
93      NaN
172     NaN
         ..
31097   NaN
36812   NaN
38278   NaN
41460   NaN
45698   NaN
Name: mass, Length: 107, dtype: float64

#### Replacing The Missing `mass` Values

**Note:** The code below will throw a warning. Ignore it!


In [None]:
# Missing values in the 'mass' column in the 'correct_lat_long_df' DataFrame with median of mass.
correct_lat_long_df.loc[row_indices, 'mass'] = correct_lat_long_df['mass'].median()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value)


In [None]:
# Check whether all the missing mass values have been replaced by the median of the mass values or not.
correct_lat_long_df.loc[row_indices, :]

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
12,Aire-sur-la-Lys,425,Valid,Unknown,29.6,Fell,1769.0,50.66667,2.33333,"(50.666670, 2.333330)"
38,Angers,2301,Valid,L6,29.6,Fell,1822.0,47.46667,-0.55000,"(47.466670, -0.550000)"
76,Barcelona (stone),4944,Valid,OC,29.6,Fell,1704.0,41.36667,2.16667,"(41.366670, 2.166670)"
93,Belville,5009,Valid,OC,29.6,Fell,1937.0,-32.33333,-64.86667,"(-32.333330, -64.866670)"
172,Castel Berardenga,5292,Valid,Stone-uncl,29.6,Fell,1791.0,43.35000,11.50000,"(43.350000, 11.500000)"
...,...,...,...,...,...,...,...,...,...,...
31097,Palermo,18076,Valid,Unknown,29.6,Found,1966.0,-34.55000,-58.43333,"(-34.550000, -58.433330)"
36812,San Luis,23129,Valid,H,29.6,Found,1964.0,-33.33333,-66.38333,"(-33.333330, -66.383330)"
38278,Weiyuan,24233,Valid,Mesosiderite,29.6,Found,1978.0,35.26667,104.31667,"(35.266670, 104.316670)"
41460,Yamato 792768,28117,Valid,CM2,29.6,Found,1979.0,-71.50000,35.66667,"(-71.500000, 35.666670)"


In [None]:
# Descriptive statistics for the 'mass' column in the above DataFrame containing 107 rows.
correct_lat_long_df.loc[row_indices, 'mass'].describe()

count    1.070000e+02
mean     2.960000e+01
std      5.711092e-14
min      2.960000e+01
25%      2.960000e+01
50%      2.960000e+01
75%      2.960000e+01
max      2.960000e+01
Name: mass, dtype: float64

In [None]:
# Descriptive statistics for the 'mass' column in the 'correct_lat_long_df' DataFrame.
correct_lat_long_df['mass'].describe()

count    3.203600e+04
mean     1.848105e+04
std      6.857023e+05
min      0.000000e+00
25%      6.500000e+00
50%      2.960000e+01
75%      2.006500e+02
max      6.000000e+07
Name: mass, dtype: float64

---

In [None]:
#Convert the 'year' values into integer type values.
correct_lat_long_df['year'] = correct_lat_long_df['year'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Now, let's confirm whether we have converted the `year` values into integer using the `dtype` keyword. We learnt about the `dtype` keyword in the NumPy arrays class.

In [None]:
# confirm whether the 'year' values are integer or not using the 'dtype' keyword.
correct_lat_long_df['year'].dtype

dtype('int64')

---

In [None]:
# find out how many meteorites are in good condition ('nametype' == 'Valid') and how many are withered ('nametype' == 'Relict').
correct_lat_long_df['nametype'].value_counts()

Valid     31967
Relict       69
Name: nametype, dtype: int64

In [None]:
# create a DataFrame called 'found_relict_df' and store data only for those meteorites which were actually found.
found_relict_df = correct_lat_long_df[(correct_lat_long_df['fall'] == 'Found')&(correct_lat_long_df['nametype'] == 'Relict')]
found_relict_df

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
5182,Brunflo,5157,Relict,Relict H,29.6,Found,1980,63.11667,14.28333,"(63.116670, 14.283330)"
6810,David Glacier 92308,6614,Relict,Chondrite-fusion crust,1.7,Found,1992,-75.31667,162.00000,"(-75.316670, 162.000000)"
12627,Gove,52859,Relict,Relict iron,0.0,Found,1979,-12.26333,136.83833,"(-12.263330, 136.838330)"
15944,Gullhögen 001,44889,Relict,Relict OC,29.6,Found,2000,58.38333,13.80000,"(58.383330, 13.800000)"
20674,Lewis Cliff 87241,13702,Relict,Chondrite-fusion crust,0.5,Found,1987,-84.34563,161.31058,"(-84.345630, 161.310580)"
...,...,...,...,...,...,...,...,...,...,...
31072,Österplana 060,56159,Relict,Relict OC,0.0,Found,2009,58.58333,13.43333,"(58.583330, 13.433330)"
31073,Österplana 061,56160,Relict,Relict OC,0.0,Found,2009,58.58333,13.43333,"(58.583330, 13.433330)"
31074,Österplana 062,56161,Relict,Relict OC,0.0,Found,2010,58.58333,13.43333,"(58.583330, 13.433330)"
31075,Österplana 063,56162,Relict,Relict OC,0.0,Found,2010,58.58333,13.43333,"(58.583330, 13.433330)"


In [None]:
# create a DataFrame called 'found_valid_df' and store data only for those meteorites which were actually found.
found_valid_df = correct_lat_long_df[(correct_lat_long_df['fall'] == 'Found')&(correct_lat_long_df['nametype'] == 'Valid')]
found_valid_df

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
1108,Abajo,4,Valid,H5,331.00,Found,1982,26.80000,-105.41667,"(26.800000, -105.416670)"
1109,Abar al' Uj 001,51399,Valid,H3.8,194.34,Found,2008,22.72192,48.95937,"(22.721920, 48.959370)"
1110,Abbott,5,Valid,H3-6,21100.00,Found,1951,36.30000,-104.28333,"(36.300000, -104.283330)"
1111,Abernathy,7,Valid,L6,2914.00,Found,1941,33.85000,-101.80000,"(33.850000, -101.800000)"
1112,Abo,8,Valid,H,1.20,Found,1840,60.43333,22.30000,"(60.433330, 22.300000)"
...,...,...,...,...,...,...,...,...,...,...
45711,Zillah 002,31356,Valid,Eucrite,172.00,Found,1990,29.03700,17.01850,"(29.037000, 17.018500)"
45712,Zinder,30409,Valid,"Pallasite, ungrouped",46.00,Found,1999,13.78333,8.96667,"(13.783330, 8.966670)"
45713,Zlin,30410,Valid,H4,3.30,Found,1939,49.25000,17.66667,"(49.250000, 17.666670)"
45714,Zubkovsky,31357,Valid,L6,2167.00,Found,2003,49.78917,41.50460,"(49.789170, 41.504600)"


So there are 30,871 meteorites which were found in good condition.

---

In [None]:
# create a cartogram for the landing sites of the meteorites found in the withered condition.
import folium
map1 = folium.Map(location = [0,0], width = "90%",height="90%",zoom_start = 1, tiles = "stamen terrain")
for i in found_relict_df.index:
  folium.Marker(location  = [found_relict_df.loc[i,"reclat"],found_relict_df.loc[i,"reclong"]],popup= found_relict_df.loc[i,"name"]).add_to(map1)
map1


,In the above code:

1. We created a world map by providing the coordinates of the equator as an input to the `Map()` function with the `zoom_start` value equal to 1. For a better visibility, we have chosen the `Stamen Toner` background for the map.

2. Next, we are adding the markers to the map for each meteorite existing in the `found_relict_df` DataFrame using the `for` loop.

    - The `for` loop iterates through each index in the `found_relict_df` DataFrame. Recall that to get the array containing the indices for a Pandas series, we use the `index` keyword.

    ```
    for i in found_relict_df.index:
    ```

    - Using the `loc[]` function, we get the `reclat`, `reclong` and `name` values from the `found_relict_df` DataFrame.

    ```
    for i in found_relict_df.index:
        folium.Marker(location=[found_relict_df.loc[i, 'reclat'], found_relict_df.loc[i, 'reclong']],
                popup=found_relict_df.loc[i, 'name']).add_to(map1)
    ```

    - The `add_to()` function adds a marker to the `map1` map for each meteorite existing in the `found_relict_df` DataFrame.




---

#### Activity 4: Cartograms For Good Condition Meteorites^^^

Now, let's make a cartogram for the meteorites which are found in good condition (`found_valid_df`). As we saw earlier, 30,871 meteorites were found in good condition. Hence, creating markers for each of these meteorites is impractical. So, we can slice the `found_valid_df` DataFrame based on year.

Let's estimate how many meteorites were found in good condition after the year 2010.

In [None]:
# find out how many meteorites were found in good condition after the year 2010.
found_valid_df[found_valid_df['year'] > 2010].shape

(398, 10)

So, 398 meteorites were found in good condition after the year 2010. Let's create a cartogram for these meteorites. We will use the same process which we used to create a cartogram for the withered meteorites.



In [None]:
# create a cartogram for the meteorites found in good condition after the year 2010. Popup the location name.
map2 = folium.Map(location = [0,0], width = "90%",height="90%",zoom_start = 1, tiles = "stamen terrain")
for i in found_valid_df[found_valid_df['year'] > 2010].index:
  folium.Marker(location  = [found_valid_df.loc[i,"reclat"],found_valid_df.loc[i,"reclong"]],popup= found_valid_df.loc[i,"name"]).add_to(map2)
map2

Similarly, let's create a cartogram for the meteorites found in good condition from 2008 to 2010 (both inclusive).

In [None]:
# find out how many meteorites found in good condition from 2008 to 2010 (both inclusive).
found_valid_df[(found_valid_df['year'] >= 2008) &(found_valid_df['year'] <= 2010)].shape


(976, 10)

So, 976 meteorites were found in good condition from 2008 to 2010 (both inclusive). Now, let's create a cartogram for these meteorites.

In [None]:
# create a cartogram for the meteorites found in good condition from 2008 to 2010 (both inclusive).
map3 = folium.Map(location = [0,0], width = "90%",height="90%",zoom_start = 1, tiles = "stamen terrain")
for i in found_valid_df[(found_valid_df['year'] >= 2008) &(found_valid_df['year'] <= 2010)].index:
  folium.Marker(location  = [found_valid_df.loc[i,"reclat"],found_valid_df.loc[i,"reclong"]],popup= found_valid_df.loc[i,"name"]).add_to(map3)
map3

In [None]:
map3 = folium.Map(location = [0,0], width = "90%",height="90%",zoom_start = 1, tiles = "stamen terrain")
for i in found_valid_df[(found_valid_df['year'] >= 2008) &(found_valid_df['year'] <= 2010)].index:
  folium.Marker(location  = [found_valid_df.loc[i,"reclat"],found_valid_df.loc[i,"reclong"]],popup= found_valid_df.loc[i,"name"]+ "\n" + str(found_valid_df.loc[i,'mass'] / 1000) + "Kg").add_to(map3)
map3

If you click on any of the markers, you will see that it displays both the city name and the mass of a meteorite. We can also add circular markers such that the radius of a circular marker represents the mass of the meteorite fallen. So, the bigger the circle, the heavier the meteorite.

In [None]:
#add circular markers such that the radius of a circular marker represents the mass of the meteorite fallen.
map4 = folium.Map(location = [0,0], width = "90%",height="90%",zoom_start = 1, tiles = "stamen terrain")
for i in found_valid_df[(found_valid_df['year'] >= 2008) &(found_valid_df['year'] <= 2010)].index:
  folium.Circle(location  = [found_valid_df.loc[i,"reclat"],found_valid_df.loc[i,"reclong"]],popup= found_valid_df.loc[i,"name"]+ "\n" + str(found_valid_df.loc[i,'mass'] / 1000) + "Kg", radius = found_valid_df.loc[i,'mass'], color ="red",fill = True, fill_color = "red" ).add_to(map4)
map4

---