** This is the Machine Learning portion of a larger project. You can read the entire project here: Data Science for Global Wildlife Trafficking blog post. The Tableau Dashboards can be found here:
This application is an extensive modeling approach for Wildlife Trafficking (live animals and wildlife products) importing via shipments into the US. You can access the application here: Machine Learning for Global Wildlife Trafficking
It was designed by Raya Abourjeily, Dr. Neil Carter, Alex Hardy, Ani Madurkar (all authors contributed to this equally). You can find Dr. Carter here: https://www.coexistencegroup.com/
This product leverages 2 datasets: LEMIS and Panjiva. The LEMIS data set includes data on 15 years of the importation of wildlife and their derived products into the United States (2000–2014), originally collected by the United States Fish and Wildlife Service. The Panjiva data set was manually downloaded through a paid Panjiva account and it includes data for imported shipments (2007-2021) related to wildlife for HS codes 01, 02, 03, 04, and 05 as these represent animals & animal products.
The data used in this application is only about 1% (total, before filtering to some percentage of training data) of all data available due to Github and Streamlit size limitations.
These instructions will get you a copy of the project up and running on your local machine.
Get a copy of this project by simply running the git clone command.
git clone https://github.com/AniMadurkar/Machine-Learning-for-Global-Wildlife-Trafficking.git
Before running the project, we have to install all the dependencies from requirements.txt
pip install -r requirements.txt
Last, get the project hosted on your local machine with a single command.
streamlit run streamlit_app.py
streamlit_app_demo1.mov
streamlit_app_demo2.mov
If you go to downloading the full LEMIS dataset or happen to have a Panjiva subscription and download a bunch of files, you can use our ETL scripts to help clean each of them as well. All you want to do is set up your folder structure so that in the folder with the scripts, you have two additional folders:
- Lemis Data
- Panjiva Data
Drop only the data files in each of these respectively and then you can run the scripts as such:
python lemis.py
python panjiva.py
These have code at the bottom to output a sample of each cleaned file. If you comment out the lines where you're getting the sample, you can output the full cleaned dataframe to csv.
- Provide functionality to view the predictions in an easier way
- Add filtering and visualizations in the main page to contextually evaluate the predictions
- Incorporate inutitive ways to add dates in the model (will need further discussion with our stakeholders to properly handle this)