# DEEP SEA CORALS PROJECT
***

# Goals
***

- Draw insights about corals
- Create a model that can predict the vernacular name cateogories of the corals

# Acquire
Acquiring the data from local csv file
***

In [1]:
# establishing environment
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

In [2]:
# importing data
df = pd.read_csv('deep_sea_corals.csv')

In [3]:
# previewing data
df.head()

Unnamed: 0,CatalogNumber,DataProvider,ScientificName,VernacularNameCategory,TaxonRank,Station,ObservationDate,latitude,longitude,DepthInMeters,DepthMethod,Locality,LocationAccuracy,SurveyID,Repository,IdentificationQualifier,EventID,SamplingEquipment,RecordType,SampleID
0,,,,,,,,degrees_north,degrees_east,,,,,,,,,,,
1,625366.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-02,18.30817,-158.45392,959.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:45:26:28
2,625373.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30864,-158.45393,953.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:24:35:53
3,625386.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30877,-158.45384,955.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:15:22:09
4,625382.0,"NOAA, Deep Sea Coral Research & Technology Pro...",Madrepora oculata,stony coral (branching),species,D2-EX1504L3-05,2015-09-01,18.30875,-158.45384,955.0,reported,"Hawaiian Archipelago, Swordfish Seamount",50m,Hohonu Moana: Exploring Deep Waters off Hawai'i,University of Hawaii,ID by expert from video,D2-EX1504L3-05,ROV,video observation,EX1504L3_05_20150901T181522Z.mp4_05:13:29:50


# Prepare
Preparing the data for exploration and modeling
***

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513373 entries, 0 to 513372
Data columns (total 20 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   CatalogNumber            513372 non-null  float64
 1   DataProvider             513372 non-null  object 
 2   ScientificName           513372 non-null  object 
 3   VernacularNameCategory   513197 non-null  object 
 4   TaxonRank                513364 non-null  object 
 5   Station                  253590 non-null  object 
 6   ObservationDate          513367 non-null  object 
 7   latitude                 513373 non-null  object 
 8   longitude                513373 non-null  object 
 9   DepthInMeters            513372 non-null  float64
 10  DepthMethod              496845 non-null  object 
 11  Locality                 389645 non-null  object 
 12  LocationAccuracy         484662 non-null  object 
 13  SurveyID                 306228 non-null  object 
 14  Repo

- Drop CatalogNumber and SampleID columns
     - The information in these columns will not be useful for the operations of this project
     
     
- Many null values
    - I'll drop them after dropping columns I don't plan to use for this first iteration of this project
        - If too many rows are lost I'll impute values to preserve more rows 
        
        
- Data types look okay for now but I'll update if needed to facilitate operations


- Rename columns 
    - all lowercase
    - "_" between words in names


- Make all values lowercase where applicable

### Dropping Columns

In [5]:
# dropping specified columns
df = df.drop(columns = ['CatalogNumber', 'SampleID'])

### Dropping Nulls

In [6]:
# dropping all null values
df = df.dropna()

### Renaming Columns

In [20]:
# adding underscores to various column names
df.columns = ['Data_Provider', 'Scientific_Name', 'Vernacular_Name_Category', 'Taxon_Rank',
       'Station', 'Observation_Date', 'latitude', 'longitude', 'Depth_Meters',
       'Depth_Method', 'Locality', 'Location_Accuracy', 'Survey_ID', 'Repository',
       'Identification_Qualifier', 'Event_ID', 'Sampling_Equipment',
       'Record_Type']

# lower casing all column names
df.columns = df.columns.str.lower()

# Explore
Exploring the data to draw insights about the corals
***

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 180466 entries, 1 to 511957
Data columns (total 18 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   data_provider             180466 non-null  object 
 1   scientific_name           180466 non-null  object 
 2   vernacular_name_category  180466 non-null  object 
 3   taxon_rank                180466 non-null  object 
 4   station                   180466 non-null  object 
 5   observation_date          180466 non-null  object 
 6   latitude                  180466 non-null  object 
 7   longitude                 180466 non-null  object 
 8   depth_meters              180466 non-null  float64
 9   depth_method              180466 non-null  object 
 10  locality                  180466 non-null  object 
 11  location_accuracy         180466 non-null  object 
 12  survey_id                 180466 non-null  object 
 13  repository                180466 non-null  o

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [10]:
df.ScientificName.value_counts()

Porifera              33618
Stylaster sp.         12403
Lophelia pertusa       9636
Gorgonacea             8443
Latrunculia sp.        7332
                      ...  
Anthomuricea sp.          1
Vulcanellidae             1
Myriopathes ulex          1
Plumarella superba        1
Telopathes sp.            1
Name: ScientificName, Length: 577, dtype: int64

In [11]:
df.VernacularNameCategory.value_counts()

gorgonian coral               52346
sponge (unspecified)          33618
demosponge                    27965
glass sponge                  16331
lace coral                    16183
stony coral (branching)       11071
soft coral                     8724
black coral                    6443
sea pen                        3628
stony coral (cup coral)        2019
calcareous sponge               749
stony coral (unspecified)       607
stoloniferan coral              360
scleromorph sponge              258
gold coral                       85
other coral-like hydrozoan       79
Name: VernacularNameCategory, dtype: int64

In [12]:
df.TaxonRank.value_counts()

species       53305
genus         52809
phylum        33618
order         16318
family        12864
class          6940
subfamily      3512
subgenus        459
subspecies      444
suborder        126
subclass         71
Name: TaxonRank, dtype: int64

In [13]:
df.Station.value_counts()

14            12969
VK826          8659
transect 1     7357
Area_2         7005
Area_1         6428
              ...  
320-166           1
337-165           1
Dec-38            1
217-123           1
197-91            1
Name: Station, Length: 5304, dtype: int64

In [14]:
df.ObservationDate.value_counts()

2008-07-11    12987
2008-07-13     6217
2010-06-29     5427
2004-08-05     5381
2014-09-15     4603
              ...  
2005-08-10        1
2012-07-22        1
1992-08-18        1
1987-08-18        1
1994-09-23        1
Name: ObservationDate, Length: 2798, dtype: int64

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      

In [9]:
df.DataProvider.value_counts()

NOAA, Alaska Fisheries Science Center                                                              57621
NOAA, Southwest Fisheries Science Center, Santa Cruz                                               42456
NOAA, Olympic Coast National Marine Sanctuary                                                      35821
NOAA, Office of Ocean Exploration and Research                                                     11149
Temple University                                                                                  10893
Harbor Branch Oceanographic Institute                                                               9365
NOAA, Deep Sea Coral Research & Technology Program and Office of Ocean Exploration and Research     6166
Ross, Steve                                                                                         2859
NOAA, Northwest Fisheries Science Center                                                            1553
Bureau of Ocean Energy Management                      