Download the following data source: Meteorite Landings in NASA's Open Data Portal.

In [22]:
import pandas as pd
meteors=pd.read_csv("Meteorite_Landings.csv")
meteors.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45716 entries, 0 to 45715
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         45716 non-null  object 
 1   id           45716 non-null  int64  
 2   nametype     45716 non-null  object 
 3   recclass     45716 non-null  object 
 4   mass (g)     45585 non-null  float64
 5   fall         45716 non-null  object 
 6   year         45425 non-null  float64
 7   reclat       38401 non-null  float64
 8   reclong      38401 non-null  float64
 9   GeoLocation  38401 non-null  object 
dtypes: float64(4), int64(1), object(5)
memory usage: 3.5+ MB


Remove the nametype field

In [23]:
meteors.drop('nametype',inplace=True,axis=1)
meteors.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45716 entries, 0 to 45715
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         45716 non-null  object 
 1   id           45716 non-null  int64  
 2   recclass     45716 non-null  object 
 3   mass (g)     45585 non-null  float64
 4   fall         45716 non-null  object 
 5   year         45425 non-null  float64
 6   reclat       38401 non-null  float64
 7   reclong      38401 non-null  float64
 8   GeoLocation  38401 non-null  object 
dtypes: float64(4), int64(1), object(4)
memory usage: 3.1+ MB


Clean the mass (g) field so that there's a default value of 0 where there is no mass listed.

In [24]:
meteors['mass (g)'].fillna(0,inplace=True)
meteors.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45716 entries, 0 to 45715
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         45716 non-null  object 
 1   id           45716 non-null  int64  
 2   recclass     45716 non-null  object 
 3   mass (g)     45716 non-null  float64
 4   fall         45716 non-null  object 
 5   year         45425 non-null  float64
 6   reclat       38401 non-null  float64
 7   reclong      38401 non-null  float64
 8   GeoLocation  38401 non-null  object 
dtypes: float64(4), int64(1), object(4)
memory usage: 3.1+ MB


There is missing year data. Determine whether to handle it.

In [25]:
has_year=meteors.dropna(subset=['year'])
has_year.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 45425 entries, 0 to 45715
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         45425 non-null  object 
 1   id           45425 non-null  int64  
 2   recclass     45425 non-null  object 
 3   mass (g)     45425 non-null  float64
 4   fall         45425 non-null  object 
 5   year         45425 non-null  float64
 6   reclat       38223 non-null  float64
 7   reclong      38223 non-null  float64
 8   GeoLocation  38223 non-null  object 
dtypes: float64(4), int64(1), object(4)
memory usage: 3.5+ MB


Store each decade's data in its own data sheet in Excel using pandas. So 1990-1999, 2000-2009, 2010-2019, etc.

In [32]:
has_year['year']=has_year['year'].astype(int)
pre_20th_century=has_year[has_year.year<1900]
with pd.ExcelWriter('meteorites_by_decade.xlsx') as xlsxWriter:
    pre_20th_century.to_excel(xlsxWriter,sheet_name="pre 20th century")
    for i in range(10):
        years_to_present=has_year[has_year.year>=1900+i*10]
        years_in_decade=years_to_present[years_to_present.year<1910+i*10]
        decade="{0:0>2}".format(i*10)
        years_in_decade.to_excel(xlsxWriter,sheet_name=f"The 19{decade}s")
    for i in range(3):
        years_to_present=has_year[has_year.year>=2000+i*10]
        years_in_decade=years_to_present[years_to_present.year<2010+i*10]
        decade="{0:0>2}".format(i*10)
        years_in_decade.to_excel(xlsxWriter,sheet_name=f"The 20{decade}s")
    
xlsxWriter.save()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  has_year['year']=has_year['year'].astype(int)
  warn("Calling close() on already closed file.")


Format the date in MM/dd/yyyy with the time in hours:minutes:seconds AM/PM.

In [21]:
meteors.head()

Unnamed: 0,name,id,recclass,mass (g),fall,year,reclat,reclong,GeoLocation
0,Aachen,1,L5,21.0,Fell,1880.0,50.775,6.08333,"(50.775, 6.08333)"
1,Aarhus,2,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.18333, 10.23333)"
2,Abee,6,EH4,107000.0,Fell,1952.0,54.21667,-113.0,"(54.21667, -113.0)"
3,Acapulco,10,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.9,"(16.88333, -99.9)"
4,Achiras,370,L6,780.0,Fell,1902.0,-33.16667,-64.95,"(-33.16667, -64.95)"


Make sure that when the data is imported that all diacritics (umlauts over o's - for example - as in records 44806, 44843-44847 show) are loaded.

In [20]:
names=list(meteors['name'])
print(len(names))
for i in range(len(names)):
    print(names[i])

45716
Aachen
Aarhus
Abee
Acapulco
Achiras
Adhi Kot
Adzhi-Bogdo (stone)
Agen
Aguada
Aguila Blanca
Aioun el Atrouss
Aïr
Aire-sur-la-Lys
Akaba
Akbarpur
Akwanga
Akyumak
Al Rais
Al Zarnkh
Alais
Albareto
Alberta
Alby sur Chéran
Aldsworth
Aleppo
Alessandria
Alexandrovsky
Alfianello
Allegan
Allende
Almahata Sitta
Alta'ameem
Ambapur Nagla
Andhara
Andover
Andreevka
Andura
Northwest Africa 5815
Angers
Angra dos Reis (stone)
Ankober
Anlong
Aomori
Appley Bridge
Apt
Arbol Solo
Archie
Arroyo Aguiar
Asco
Ash Creek
Ashdon
Assisi
Atarra
Atemajac
Athens
Atoka
Aubres
Aumale
Aumieres
Ausson
Avanhandava
Avce
Avilez
Awere
Aztec
Bachmut
Bahjoi
Bald Mountain
Baldwyn
Bali
Ban Rong Du
Bandong
Bansur
Banswal
Banten
Barbotan
Barcelona (stone)
Barea
Barnaul
Barntrup
Baroti
Barwell
Bassikounou
Baszkówka
Bath
Bath Furnace
Battle Mountain
Bawku
Baxter
Beardsley
Beaver Creek
Beddgelert
Bells
Belville
Benares (a)
Benguerir
Beni M'hira
Benld
Benoni
Bensour
Benton
Berduc
Béréba
Berlanguillas
Berthoud
Bethlehem
Beuste
Beyr