# Tool Imports

In [1]:
import pandas as pd


# Overview

This notebook aims to scrape the US Department of Justice's Executive Office for Immigration Review website to obtain current detail for each sitting judge.


## Objectives

- Explore the data availability
- Understand if additional resources are necessary
    - Find if applicable
- Find product relevant data: judge name, locale (district, court location, etc.), biography as a stretch goal.
- Establish best way to extract data into a readable format for backend end point

## Considerations
1) Relevance: How are prior practicing judges handled? An individual attorney may upload a case file for a judge no longer practicing law. How will the system handle this use case?

2) Timeliness: How current is the information provided? 

3) Reliability: How reliable is the data being scraped? Given the data is drawn from a .gov site there is a strong likely hood the information can be relied upon for accuracy.

4) Scalability: Will the solution in this notebook scale?

Product Scope Note: This product formerly included appellate court cases. Cases where the original decision was appealed by either party and elevated to the next step in the immigration court judicial process. Please look for any reference to immigration appeals, BIA, board of appeals, etc. and insure that information is not included unless otherwise requested by the stakeholder. Example web site for exclusion includes: https://www.justice.gov/eoir/board-of-immigration-appeals-bios

# Data Establishment

In [30]:
# Load the data and save as a variable
district_data = pd.read_html('https://www.justice.gov/eoir/eoir-immigration-court-listing')

bio_data = pd.read_html('https://www.justice.gov/eoir/office-of-the-chief-immigration-judge-bios')


In [None]:
# Establish universal variables of key search words
# These variables can be used for string searches

accepted_words = ['judge', 'judges', 'eoir', 'immigration judge', 'immigration judges', 'district', 'district court', 'district courts']

# rejected_words variable can be used to verify 
rejected_words = ['appeal', 'appeals', 'bia', 'appellate', 'appeal']


In [53]:
# Set display option value to none allowing a view of all characters
pd.set_option('display.max_colwidth', None)

# Source material for this functionality here
# https://towardsdatascience.com/8-commonly-used-pandas-display-options-you-should-know-a832365efa95


In [58]:
# Brief data content overview

# Length of the data
print('The length of the district data is:', len(district_data))
print('The length of the biography data is:', len(bio_data))

# View the first line of data
print('District data example:', district_data[0])
print('Biography data example:', bio_data[0])


The length of the district data is: 31
The length of the biography data is: 1
District data example:                                                                                                                                                                                                                                                                                                                                                                0
0  Arizona | California | Colorado | Connecticut | Florida | Georgia | Hawaii | Illinois  Louisiana | Maryland | Massachusetts | Michigan | Minnesota | Missouri Nebraska | Nevada | New Jersey | New Mexico | New York | North Carolina Northern Mariana Islands | Ohio | Oregon | Pennsylvania | Puerto Rico | Tennessee  Texas | Utah | Virginia | Washington
Biography data example:                             0                           1
0                Access  ECAS                Access  ICOR
1  Find Legal  Representation         Submit a  Compl

In [None]:
# Move forward with district_data

In [79]:
# Check to ensure required terms are present

def word_check(list):

    required_words = ['judge', 'immigration judge', 'judges', 'immigration judges', 'district', 'district court']

    for word in list:
        if word in required_words:
            return True


In [80]:
word_check(district_data)

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [81]:
district_data.item()

AttributeError: 'list' object has no attribute 'item'

In [None]:
# Goal
# Find the string that includes judges names
# Split the string by spaces