# Fish Welfare Project
## Part 2: Supplemental data

* Author: Angelina Li
* Date: 2019/09/11
* Description: Now that we have a collection of information scraped from the FishEthoBase, it might be good to collect some supplemental information on each species, if possible. In particular, I'm interested in finding population number / catch number data on these species.

## Notebook tasks
1. Import in DB data. Potentially convert into a dataframe for easier usage.
2. Research and integrate different data sources, as possible.

In [1]:
import json
import os
import pandas as pd
import random
import re
import requests

# eventually I might want to migrate to scrapy, but BS is easier to use on jupyter and
# makes more sense for a small, single-scrape project.
from bs4 import BeautifulSoup

In [2]:
MAIN_DIR = ".."
DATA_DIR = os.path.join(MAIN_DIR, "data")
DB_FILEPATH = os.path.join(DATA_DIR, "fishdb.json")

S_PAUSE = 5 # how many seconds to pause in between requests
REQ_SUCCESS = 200 # success status code

In [3]:
# define some potentially useful helper function/s first

def get_soup(url_address, pause_secs=S_PAUSE):
    page = requests.get(url_address)
    if page.status_code != REQ_SUCCESS:
        print("Couldn't load content on this page:", url_address)
        return
    soup = BeautifulSoup(page.content, "html.parser")
    time.sleep(random.uniform(0.5, 1.5) * pause_secs)
    print("Loaded page:", url_address)
    return soup

In [4]:
# grab the original dataset
with open(DB_FILEPATH, "r") as datafile:
    db_data = json.loads(datafile.read())

db_data[0]

{'link_summary': 'http://fishethobase.net/db/28/',
 'name_latin': 'Octopus vulgaris',
 'name_english': 'Common octopus',
 'sp_id': 'commonoctopus',
 'description': 'Octopus vulgaris has recently aroused much interest in aquaculture, considered suitable for large-scale production given its commercial value, its fecundity, rapid growth, high protein content, and high feed conversion rate. The main problem, however, is the high mortality rate observed during paralarval rearing, making successful juvenile settlement still very difficult to achieve. Unfortunately, despite the high knowledge on the biology and ethology of this species, there are many other aspects to be solved from a welfare perspective. For instance, the current farming systems result in high stress in O. vulgaris due to spatial constraint, high densities and sociability, which consequently increase aggression (cannibalism and autophagy) at different life stages. In addition, octopus skin is particularly sensitive and can b

In [5]:
# flatten data, and convert into a dataframe for readability.
def flatten_species_data(species_dict):
    flattened = species_dict.copy() # shallow copy - delete etho scores section
    scores = flattened.pop("etho_scores") # removes the problematic etho_scores dictionary for parsing
    for crit in scores:
        for level in scores[crit]:
            flat_name = "{}_{}".format(crit, level[:2]) # take first two chars per level name
            flattened[flat_name] = scores[crit][level]
    return flattened

flatten_species_data(db_data[0])

{'link_summary': 'http://fishethobase.net/db/28/',
 'name_latin': 'Octopus vulgaris',
 'name_english': 'Common octopus',
 'sp_id': 'commonoctopus',
 'description': 'Octopus vulgaris has recently aroused much interest in aquaculture, considered suitable for large-scale production given its commercial value, its fecundity, rapid growth, high protein content, and high feed conversion rate. The main problem, however, is the high mortality rate observed during paralarval rearing, making successful juvenile settlement still very difficult to achieve. Unfortunately, despite the high knowledge on the biology and ethology of this species, there are many other aspects to be solved from a welfare perspective. For instance, the current farming systems result in high stress in O. vulgaris due to spatial constraint, high densities and sociability, which consequently increase aggression (cannibalism and autophagy) at different life stages. In addition, octopus skin is particularly sensitive and can b

In [6]:
db_flattened_data = list(map(flatten_species_data, db_data))
db_df = pd.DataFrame(db_flattened_data)
db_df.head()

Unnamed: 0,link_summary,name_latin,name_english,sp_id,description,link_profile,home_range_li,home_range_po,home_range_ce,depth_range_li,...,malformation_li,malformation_po,malformation_ce,slaughter_li,slaughter_po,slaughter_ce,fishethoscore_li,fishethoscore_po,fishethoscore_ce,filename_image
0,http://fishethobase.net/db/28/,Octopus vulgaris,Common octopus,commonoctopus,Octopus vulgaris has recently aroused much int...,http://fishethobase.net/db/28/shortprofile/,low,low,high,low,...,nofindings,unclear,nofindings,unclear,low,middle,0,1,3,
1,http://fishethobase.net/db/21/,Litopenaeus vannamei,Pacific whiteleg shrimp,pacificwhitelegshrimp,"Like other farmed shrimp species, the Pacific ...",http://fishethobase.net/db/21/shortprofile/,unclear,nofindings,low,low,...,unclear,nofindings,low,low,middle,middle,0,2,3,
2,http://fishethobase.net/db/34/,Penaeus monodon,Giant tiger prawn (Black tiger,gianttigerprawnblacktiger,Penaeus monodon is one of the most cultivated ...,http://fishethobase.net/db/34/shortprofile/,unclear,nofindings,low,low,...,unclear,nofindings,low,low,middle,middle,0,1,2,gianttigerprawnblacktiger.jpg
3,http://fishethobase.net/db/2/,Acipenser baerii,Siberian sturgeon,siberiansturgeon,"Acipenser baerii, an endangered species accord...",http://fishethobase.net/db/2/shortprofile/,unclear,middle,middle,low,...,low,high,middle,low,high,middle,0,2,0,siberiansturgeon.jpg
4,http://fishethobase.net/db/3/,Acipenser gueldenstaedtii,Russian sturgeon,russiansturgeon,Acipenser gueldenstaedtii is a critically enda...,http://fishethobase.net/db/3/shortprofile/,unclear,nofindings,low,low,...,unclear,middle,low,unclear,high,low,0,2,2,russiansturgeon.jpg


In [7]:
db_df[["name_english", "sp_id", "link_summary", "fishethoscore_li", "fishethoscore_po", "fishethoscore_ce"]].head(10)

Unnamed: 0,name_english,sp_id,link_summary,fishethoscore_li,fishethoscore_po,fishethoscore_ce
0,Common octopus,commonoctopus,http://fishethobase.net/db/28/,0,1,3
1,Pacific whiteleg shrimp,pacificwhitelegshrimp,http://fishethobase.net/db/21/,0,2,3
2,Giant tiger prawn (Black tiger,gianttigerprawnblacktiger,http://fishethobase.net/db/34/,0,1,2
3,Siberian sturgeon,siberiansturgeon,http://fishethobase.net/db/2/,0,2,0
4,Russian sturgeon,russiansturgeon,http://fishethobase.net/db/3/,0,2,2
5,Adriatic sturgeon,adriaticsturgeon,http://fishethobase.net/db/4/,0,1,0
6,Sterlet sturgeon,sterletsturgeon,http://fishethobase.net/db/6/,0,1,0
7,Stellate sturgeon,stellatesturgeon,http://fishethobase.net/db/5/,0,0,0
8,White sturgeon,whitesturgeon,http://fishethobase.net/db/7/,0,1,2
9,Hybrid sturgeon,hybridsturgeon,http://fishethobase.net/db/53/,0,0,0


In [8]:
# Just for ease of research, let's get a list of names for our species
db_df[["name_english", "name_latin"]]

Unnamed: 0,name_english,name_latin
0,Common octopus,Octopus vulgaris
1,Pacific whiteleg shrimp,Litopenaeus vannamei
2,Giant tiger prawn (Black tiger,Penaeus monodon
3,Siberian sturgeon,Acipenser baerii
4,Russian sturgeon,Acipenser gueldenstaedtii
5,Adriatic sturgeon,Acipenser naccarii
6,Sterlet sturgeon,Acipenser ruthenus
7,Stellate sturgeon,Acipenser stellatus
8,White sturgeon,Acipenser transmontanus
9,Hybrid sturgeon,"BAEyNAC, NACxBAE"
