This [notebook](https://crosscompute.com/n/G1O5D8XrUWgVliOd2lbDdIrINe3j2nH2) was presented by Aida Shoydokova in a workshop on [Computational Approaches to Fight Human Trafficking](https://www.meetup.com/spatiotemporal-analysis-for-community-health-and-safety/events/244179401).

# Problem Statement

### There is a huge need for the development of solutions for better collection, measurement, analysis, sharing and transparency of data on Human Trafficking [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)]. There are significant gaps in knowledge of how to prevent human trafficking remain. Additional efforts and resources for research, data collection, and evaluation are needed to identify those actions most effective to prevent victimization [[2](https://www.state.gov/documents/organization/258876.pdf)]. 


### Current State of Human Trafficking:

* **Insufficient data**. Insufficient quality data is being collected by all organisations especially by businesses- around potential incidents of slavery, how slavery affects them and vice versa. There is no clarity on what data is required and on gaps in data. Reliable data collection from victims and survivors could help build the best datasets [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)] 
* **Limited data sharing**. Datasets are being used in silos and not efficiently shared across law enforcement and civil society organisations to improve the collective response. Issues around data privacy protection and security inhibit a systematic sharing of data. A better understanding of how to use existing regulatory regimes in a manner consistent with human rights standards would be beneficial [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)] 
* **Inconsistent measurement**. Analysis of datasets is only worthwhile if sufficient data points can be accurately compared. This relies on the quality and consistency of how data points are measured. Few agreed norms and standards for measuring modern slavery characteristics exist at local, regional, national and international levels [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)] 
* **No Comprehensive Analysis**. There are need for analytical tools to collate datasets from different sources and apply AI/Data Science to spot connections [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)]. Big data analytics could have an impact in identifying and analysing migration flows of vulnerable people and identifying patterns [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)]

### Areas for Process Improvement

* Build a reliable baseline information, data, and research that illuminates the causes, prevalence, characteristics, trends, and consequences of all forms of human trafficking in various countries and cultures [[2](https://www.state.gov/documents/organization/258876.pdf)]
* Measure impact of antitrafficking prevention strategies to make accurate assessment of the impact of policies and assistance programs, including unintended negative consequences [[2](https://www.state.gov/documents/organization/258876.pdf)]
* Identify populations vulnerable to human trafficking [[2](https://www.state.gov/documents/organization/258876.pdf)]
* More comprehensive understanding of root causes that are specific to states, communities, and cultural contexts. [[2](https://www.state.gov/documents/organization/258876.pdf)]
* Understand unique vulnerabilities or break downs [[2](https://www.state.gov/documents/organization/258876.pdf)]
* Find trends [[2](https://www.state.gov/documents/organization/258876.pdf)]
* Understand migration: source and destination countries, as well as along migration routes[[2](https://www.state.gov/documents/organization/258876.pdf)]

### Challenges aka Opportunities 
* Most of the populations relevant to the study of human trafficking are part of a “hidden population”, i.e. it is almost impossible to establish a sampling frame and draw a representative sample of the population. [[1](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)]
* Given the complex nature of human trafficking, it is difficult to amass reliable data to document local, regional, and global prevalence [[2](https://www.state.gov/documents/organization/258876.pdf)]. 

# Ideas / Hypothesis to test

* Find out breakdowns or areas of vulnerabilities in the system (country, community or culture) which are exposed for Human Trafficking 
* Measure the existing policies on Human Trafficking and find out if they are effective
* Understand the process of Human Trafficking and find breakdowns in the process to help out the current victims or combat Human Trafficking:
    * Process:  recruitment, transportation, transferring, harboring, or receiving of a person.
    * Ways and Means:  threat, coercion, abduction, fraud, deceit, deception, or abuse of power.
    * Goal:  prostitution, pornography, violence and sexual exploitation, forced labor, involuntary servitude, debt bondage, or slavery
* Study 3 different stages of people being trafficked: the number of people in each stage, their characteristics, and their probability of entering the next stage, how they enter into one stage from another
    * Persons at risk of being trafficked,
    * Current victims of trafficking, and
    * Former victims of trafficking

# Solution aka 3 Steps

## 1. Data Extraction

### Human Trafficking Data Sources

| Data Source | Status | Link to Data | Link to Code| Solution|        
|:----------- |:-----: |--------------|:------------|---------|
|News  | Not Started  |||crawler|
|T Visa |Not Started |[csv/pdf](https://www.uscis.gov/tools/reports-studies/immigration-forms-data/data-set-form-i-914-application-t-nonimmigrant-status)|||
|U Visa|Not Started|[csv/pdf](https://www.uscis.gov/tools/reports-studies/immigration-forms-data/data-set-form-i-918-application-u-nonimmigrant-status)|||
|sherloc unodc|Not Started|||crawler|
|vacatur news|Not Started|||crawler|
|court cases|Not Started|[Supreme Court API](https://free.law/supreme-court-data/)<br>[LexisNexis]((https://www.lexisnexis.com/en-us/products/courtlink-for-corporate-or-professionals.page)||API<br>crawler?|
|DOJ Press Releases|Done|[API](https://www.justice.gov/developer/api-documentation/api_v1)|[github](Human-Trafficking/data_collection/doj_press_releases.py)|API|
|FBI Crime Data|Done|[API](https://crime-data-explorer.fr.cloud.gov/api)|[github](Human-Trafficking/data_collection/crime_data_explore.py)|API|
|Data.gov|Not Started|[API](https://www.data.gov/developers/apis)||API|
|Social Media|Not Started|[Twitter API](https://developer.twitter.com/en/docs)<br>[Reddit API](https://www.reddit.com/dev/api/)||tweepy package<br>praw package

* **Google News Search by Keywords**: Human Trafficking, Trafficking in Persons, Human Smuggling, U Visa, T Visa. [Look here for the logic that you can apply to filter the search](https://support.google.com/news/answer/3334?co=GENIE.Platform%3DDesktop&hl=en)
* Look up for **T Visa, U Visa data**
    * T Nonimmigrant Status (T Visa) [Statistics in csv and pdf formats](https://www.uscis.gov/tools/reports-studies/immigration-forms-data/data-set-form-i-914-application-t-nonimmigrant-status) - T nonimmigrant status provides immigration protection to victims of trafficking. The T Visa allows victims to remain in the United States and assist law enforcement authorities in the investigation or prosecution of human trafficking cases.
    * U Nonimmigrant Status (U Visa) [Statistics in csv and pdf format](https://www.uscis.gov/tools/reports-studies/immigration-forms-data/data-set-form-i-918-application-u-nonimmigrant-status) - U nonimmigrant status provides immigration protection to crime victims who have suffered substantial mental or physical abuse as a result of the crime. The U visa allows victims to remain in the United States and assist law enforcement authorities in the investigation or prosecution of the criminal activity.
* Look up at the **United Nations Office on Drug and Crimes** websites: 
    * [sherloc unodc database of trafficking in persons](https://www.unodc.org/cld/v3/sherloc/cldb/search.html?tmpl=sherloc&lng=en#?c=%7B%22filters%22:%5B%7B%22fieldName%22:%22__el.caseLaw.crimeTypes_s%22,%22value%22:%22traffickingPersonsCrimeType%22%7D%5D,%22sortings%22:%22%22,%22match%22:%22%22%7D) - a website on crime cases by country filtered by 'Human Trafficking' keyword 
    * [unodc human trafficking knowledge protal](https://www.unodc.org/cld/en/v3/htms/index.html) - a portal on Human Trafficking cases by country
* Look up at **vacatur laws** - decriminalizing Sex Trafficking Survivors; human trafficking is an international problem involving the transportation and sale of forced human labor; this particular form of human trafficking involves forcing people, usually women and minors, into commercial sex.
* Look up at **online sources of court cases**
    * **[Supreme Court API](https://free.law/supreme-court-data/)**
    * **[LexisNexis](https://www.lexisnexis.com/en-us/products/courtlink-for-corporate-or-professionals.page)**
* **[Department of Justice API](https://www.justice.gov/developer/api-documentation/api_v1)** - a good API
    * [code for this API](Human-Trafficking/data_collection/doj_press_releases.py)
* **[FBI Crime Data API](https://crime-data-explorer.fr.cloud.gov/api)** - high level statisctics on Human Trafficking
    * [code for this API](Human-Trafficking/data_collection/crime_data_explore.py)
* **[Data.gov API](https://www.data.gov/developers/apis)** - search for human trafficking
* **Social Media** - provide tweepy and praw code to your github (Aida)


### Data Collection Tools
* **[Examples of Python Scripts for REST APIs](https://github.com/AShoydokova/Human-Trafficking/tree/master/data_collection)** - two codes on getting data from APIs
* **[Scrapy Tutorial to create web crawlers](https://doc.scrapy.org/en/latest/intro/tutorial.html)** - official tutorial
    * [Examples of Scrapy web crawlers](https://github.com/AShoydokova/Web-Crawlers/tree/master/simple_examples_scrapy)
* **[HANDLING JAVASCRIPT IN SCRAPY WITH SPLASH](https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/)** - you need more than Scarpy if a web-site is heavy populated through java-scripts. This solution is to mimic a browser through Splash (rendering solution)
    * [Examples of Scrapy and Splash web crawlers](https://github.com/AShoydokova/Web-Crawlers/tree/master/examples_scrapy_splash)
* **[Web Scraping Google with Selenium](http://www.marinamele.com/selenium-tutorial-web-scraping-with-selenium-and-python)** - you need more than Scarpy if a web-site is heavy populated through java-scripts. This solution is to mimic a browser through Selenium (rendering solution). You can use Selenium without Scrapy to scrap data
    * [Steps to make Selenium work with python](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)

### Data Sources on Crime Data
* **[SpotCrime API](https://www.programmableweb.com/api/spotcrime)** - the service provides maps at the neighborhood level representing data on criminal incidents from city and county police and other authorities
    * [Unofficial SpotCrime API ](https://github.com/skyline-ai/spotcrime) 
* **[NYC Open Data. Public Safety](https://data.cityofnewyork.us/browse?category=Public+Safety&provenance=official)**    
* **[SF Open Data. Public Safety](https://data.sfgov.org/browse?category=Public+Safety)**
* **[Bureau of Justice Statistics API](https://www.bjs.gov/developer/ncvs/index.cfm)**
* **[UK Police API](https://data.police.uk/docs/)**
* **[The list of Crime Data APIs](https://www.programmableweb.com/category/crime/api)**

## 2. Data Processing (Information Retrieval) 

### Entity Extraction

* **Category and Subcategory of Human Trafficking**:
    * Sex Trafficking
        * Adult Sex Trafficking
        * Child Sex Trafficking
    * Labor
        * Bonded Labor or Debt Bondage
        * Domestic Servitude
        * Forced Child Labor
        * Unlawful Recruitment and Use of Child Soldiers
    * Organ Removal
    * Not Human Trafficking Article
    * Something else
* **Date**
    * Publication Date 
    * Conviction Date
    * Incident Start Date
    * Incident End Date 
* **Geo-Political Location**    
    * Country where a trafficker was operating
    * Country of origin of victim
    * Country of origin of trafficker
    * State/Province where a trafficker was operating
    * State/Province of origin of victim
    * State/Province of trafficker
    * City where a trafficker was operating
    * City of origin of victim
    * City of origin of trafficker
* **"ID Information"** - information that might help to dedupe incidents
    * Trafficker name
    * Victim Name
* **Demographic Information** 
    * Victim race
    * Trafficker race
    * Ethnicity of trafficker
    * Ethnicity of victim
    * Victim Age
    * Trafficker Age
    * Victim Gender
    * Trafficker Gender
    * Victim's Level of education
    * Trafficker's Level of education
    * Occupation of trafficker
    * Prior occupation of victim
    * Post occupation of victim
    * Victim's Income level
    * Trafficker's Income level
    * Victim's Marital status
    * Trafficker's Martial status
    * Religion of victim
    * Religion of trafficker
* **Length of Human Trafficking**
    * How long was a victim harbored?
    * How long did a trafficker operate?
* **How was a victim recruited?** 
    * threat
    * coercion
    * abduction
    * fraud/deceit/deception
    * abuse of power
    * something else
* **How was a victim transported/transferred?**
* **How did a victim escape?**
* **Is it a repeat victim?**
* **Is it a repeat trafficker?**

### Tools for Entity Extraction

* [Spacy](https://spacy.io/) - a NLP package
* [NLTK](http://www.nltk.org/) - a NLP package
* [Gensim](https://radimrehurek.com/gensim/) - topic modeling for humnas
* [Polyglot](https://pypi.python.org/pypi/polyglot) - a NLP package

### Evaluation of Entity Extraction ([taken from here](http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/))
#### Manually labeling a random sample
**Confusion Matrix**
![confusion matrix](http://www.scikit-yb.org/en/latest/_images/confusion_matrix.png)
* Accuracy: Overall, how often is the classifier correct?
    * (TP+TN)/Total 
* Misclassification Rate: Overall, how often is it wrong?
    * (FP+FN)/Total 
    * equivalent to 1 minus Accuracy
    * also known as "Error Rate"
* True Positive Rate: When it's actually yes, how often does it predict yes?
    * TP/Actual True = TP/(FN+TP)
    * also known as "Sensitivity" or "Recall"
* False Positive Rate: When it's actually no, how often does it predict yes?
    * FP/Actual False = FP/(TN+FP)
* Specificity: When it's actually no, how often does it predict no?
    * TN/Actual False = TN/(TN+FP) 
    * equivalent to 1 minus False Positive Rate
* Precision: When it predicts yes, how often is it correct?
    * TP/Predicted True = TP/(FP+TP) 
* Prevalence: How often does the yes condition actually occur in our sample?
    * Actual True/Total
* F Score: This is a weighted average of the true positive rate (recall) and precision.    

## 3. Building Models

### Ideas for models
* Semi-Supervised Learning:
    * [Semi-Supervised Learning with Ladder Networks Paper](https://arxiv.org/abs/1507.02672)
    * [Deconstructing the Ladder Network Architecture](https://arxiv.org/abs/1511.06430)

### Tools for building models

* [scikit-learn](http://scikit-learn.org/stable/)
* [TensorFlow](https://www.tensorflow.org/)
* [Gensim](https://radimrehurek.com/gensim/)

## Resources

### Information Resources
* [The Human Trafficking Pro Bono Legal Center](http://www.htprobono.org/resources-publications-library/) - a lot of useful links
* [Trafficking in Persons Report by U.S. Department of State 2017](https://www.state.gov/documents/organization/271339.pdf)
* [Data and research on human trafficking: A global survey 2005](http://publications.iom.int/system/files/pdf/global_survey.pdf)
* [Measuring Human Trafficking](https://www.amazon.com/Measuring-Human-Trafficking-Complexities-Pitfalls-ebook/dp/B0014TSRG0) - link to Amazon.com

### Additional Resources
* [The Slavery Research Library](http://freedomfund.org/programs/community-building/slavery-research-library/?mc_cid=237fb684f9&mc_eid=5ff9a312c9) - contains all articles featured to date in the Slavery Research Bulletin as well as other selected documents. Designed to provide easy access to the leading research on slavery related issues and allow users to search by keyword, region or type of research.
* [Domestic Workers Statistics 2013](http://www.ilo.org/public/libdoc/ilo/2013/113B09_2_engl.pdf) - estimates, methodology, global and regional statistics on domestic workers numbers

## References
* 1 - [Report on the role of digital technology in tackling modern slavery (14 June 2017)](https://www.wiltonpark.org.uk/wp-content/uploads/WP1546-Report.pdf?mc_cid=f863d1d5d4&mc_eid=3e6d3fcc42)
* 2 - [Trafficking in Persons Report by U.S. Department of State 2016](https://www.state.gov/documents/organization/258876.pdf)