![title page](./images/title.png)

# Agenda

- Artificial Intelligence Overview
- Is AI taking over?
- Computational Law
- Moral & Ethical Dilemmas
- Data Collection
- Data Quality


- NLP Applied to Law
- Applied Example - Supreme Court Cases
- Homework
- Next Week - Deep Learning for NLP


![ai today](./images/ai-today.png)

- Using computers to solve problems or make automated decisions
- For tasks that, when done by humans, typically require intelligence


![](./images/narrow-ai.png)

Most business cases in artificial intelligence focus on solving particular very pointed challenges. These narrow AIs are great at optimizing specific tasks like reccomending songs on Pandora or analyzing to improve tomato growth in a greenhouse.

Examples:

- Smartphone apps
- Spam Filters
- Google Translate


![](./images/general-ai.png)

A general AI can handle tasks from different areas and origins with the ability to shorten training time from one area to the next by taking experience gathered in one area and applying it in a different area.

This knowledge transfer is only possible if there is a semantic connection between these areas. The stronger and more dense the connection is, the faster and easier knowledge transition is achieved.


![strong ai](./images/strong-ai.png)

A strong AI is a self aware machine with ideal thoughts, feelings, concrescence, and consciousness.

# Adoption of AI Systems

![](https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAArPAAAAJGUwMmI5N2RlLTkwNDItNDM5Mi04MjVlLWFjNGJlYWJhOGE1NA.png)

![](./images/ai-techniques.png)

![](./images/ml-tree.png)

- latent variables (from Latin: present participle of lateo (“lie hidden”), as opposed to observable variables), are variables that are not directly observed but are rather inferred (through a mathematical model)

![](./images/hybrid-ai.png)

# Is AI Taking Over the Legal Industry?


![](./images/ai-law-headlines.png)

# Lawyer Task by Percentage of Time

![](https://blogs.thomsonreuters.com/answerson/wp-content/uploads/sites/3/2014/04/lawyer-task-percentage-time.jpg)

The impact on the profession, and its ability to respond, depends on two factors: percentage of work susceptible to automation and speed of change.

Innovator’s Dilemma thesis asserts: 
If the portion of legal work that is capable of automation is closer to the lower estimates and innovation is introduced over a long period of time, the changes will likely be absorbed by an evolving practice. But as the percentage of change and the speed of innovation increases, the greater the risk to the status quo and the greater the opportunity to the innovators.

Do you believe that drafting of pleadings and contracts can be automated through the creation of standards?
Do you believe that legal analytics can effectively review precedent to identify common elements and patterns, namely the issues in a litigation matter or the elements of a business transaction?


![](./images/legal-ai.png)

# Computational Law

The use of mechanical reasoning techniques to derive consequences of the facts and laws so represented.

## Data Driven Approach

- Prediction Models and Methods
- Natural Language Processing
- Visual Law
- Network Analytic Methods


## Logic Rules Based Approach

- Expert Systems
- Self executing Law
- Computable Codes


# Quantitative Legal Prediction

![](http://media.economist.com/sites/default/files/cf_images/20050312/D1205TQ8.gif)

- Predict Case Outcomes
    - Data driven legal underwriting
- Predict Legal Costs
    - Data driven legal operations
- Predict Relevant Documents
    - Data driven e-discovery and due diligence
- Predict Rogue Behavior
    - Data driven compliance
- Predict Contract Terms and Outcomes
    - Data driven transactional work


# Expert Systems

![](./images/turbo-tax1.png)

![](./images/turbo-tax2.png)

![](./images/compliance.png)

![](./images/contract-creation.png)

- Automated legal form and contract creation
- Leads to greater access of legal assistance to public
- Also used by legal aid entities to lower costs


![](./images/e-discovery.png)

![](./images/contract-analysis.png)

![](https://kirasystems.com/fp/images/screenshots/search-and-review.gif)

# Legal Chatbots

[![DONOTPAY](https://venturebeat.com/wp-content/uploads/2017/07/img_1409.jpg?fit=578%2C325&strip=all)](https://youtu.be/-rabJDCBUbY)

- Client acquisition
- Client intake
- Client interaction via website
- Providing general legal information to non lawyers or public


# Predict Case Outcomes and Justice Votes

![](https://computationallegalstudies.com/wp-content/uploads/2017/05/Screen-Shot-2017-05-03-at-12.31.25-AM.jpg)

![](./images/fantasy-scotus.png)

FantasySCOTUS is the leading Supreme Court Fantasy League. Thousands of attorneys, law students, and other avid Supreme Court followers make predictions about cases before the Supreme Court. Participation is free and Supreme Court geeks can compete for over $10,000 in cash prizes and donations.

![](./images/predictit.png)

![](./images/ai-ethics.png)

![](./images/asimov.png)

![](./images/moral-machine.png)

[![Moral Machine](https://i.ytimg.com/vi/vQUIM5ZH7Lo/maxresdefault.jpg)](https://youtu.be/XCO8ET66xE4)

# Bias in Machine Learning

![](http://alexanderhiggins.com/wp-content/uploads/2016/06/TwoPettyArrests.jpg)

Biased data perpetuated in Machine Learning
Recidivism risk


# How do we prevent misuse of data?

![](./images/data-pipeline.png)

![](./images/ethical-data-collection.png)

There are several ethical issues which must always be considered when planning any type of data collection. Data collection always costs someone something. It may cost health workers' time and energy to complete surveillance forms. It certainly costs the health coordinating organization money and time to collect, analyze, interpret, and disseminate surveillance data and results. Surveys are even more resource intensive. Data collection also costs the people in the population from which the data are collected a certain amount of time, discomfort, and potential harm.
In addition, implementing or revising programmes in response to the conclusions drawn from data collected always cost manpower, time, money, and other resources. And if the conclusions are wrong because the data were poorly collected, these resources, which could have been used otherwise, may be wasted or inefficiently employed.


# Data Quality Assessment Workflow

![](./images/data-assessment.png)

# Sources of Poor Data Quality

1. Entry quality: Did the information enter the system correctly at the origin?

2. Process quality: Was the integrity of the information maintained during processing through the system?
 
3. Identification quality: Are two similar objects identified correctly to be the same or different? 

4. Integration quality: Is all the known information about an object integrated to the point of providing an accurate representation of the object? 

5. Usage quality: Is the information used and interpreted correctly at the point of access? 

6. Aging quality: Has enough time passed that the validity of the information can no longer be trusted? 

7. Organizational quality: Can the same information be reconciled between two systems based on the way the organization constructs and views the data?

![](./images/quality-data.png)

# Natural Language Processing (NLP)

NLP is a branch of data science that consists of systematic processes for analyzing, understanding, and deriving information from the text data in a smart and efficient manner. By utilizing NLP and its components, one can organize the massive chunks of text data, perform numerous automated tasks and solve a wide range of problems such as – automatic summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation etc.


## Topic Modeling

![](http://bigdata.ices.utexas.edu/wp-content/uploads/2015/01/LDA-concept.png)

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body.
Topic Modelling is different from rule-based text mining approaches that use regular expressions or dictionary based keyword searching techniques. It is an unsupervised approach used for finding and observing the bunch of words (called “topics”) in large clusters of texts.


## Dimensionality Reduction - LSA

![](https://image.slidesharecdn.com/rnlp-combined-130522042823-phpapp01/95/natural-language-processing-in-r-rnlp-53-638.jpg?cb=1369197002)

![](https://qph.ec.quoracdn.net/main-qimg-73690166b2cdd6c9ac4c4434b0f9f0e7)

## Non-Negative Matrix Factorization (NMF)


![](https://mmolano.files.wordpress.com/2014/10/nmf.png)

# TF-IDF
Term Frequency-Inverse Document Frequency


TF-IDF (Term Frequency-Inverse Document Frequency) is a text mining technique used to categorize documents. Have you ever looked at blog posts on a web site, and wondered if it is possible to generate the tags automatically? Well, that's exactly the kind of problem TF-IDF is suited for.

It is worth noting the differences between TF-IDF and sentiment analysis. Although both could be considered classification techniques for text, their goals are distinct. On the one hand, sentiment analysis aims to classify documents into opinions such as 'positive' and 'negative'. On the other hand, TF-IDF classifies documents into categories inside the documents themselves. This would give insight about what the reviews are about, rather than if the author was happy or unhappy. If we analyzed product review data from an e-commerce site selling computer parts, we would end up with groups of documents about 'laptop', 'mouse', 'keyboard', etc. We would gain a large amount of data about the types of reviews that had been written, but would not learn anything about what the users thought of those products. Although the algorithms are similar in that they classify text, the results of each give us unique insights.

This algorithm is useful when you have a document set, particularlly a large one, which needs to be categorized. It is especially nifty because you don't need to train a model ahead of time and it will automatically account for differences in lengths of documents.

Imagine a large corporate website with tens of thousands of user contributed blog posts. Depending on the tags attached to each blog post, the item will appear on listing pages on various parts of the site. Although the authors were able to tag things manually when they wrote the content, in many cases they chose not to, and therefore many blog posts are not categorized. Empirics show that only a small fraction of users will take the time to manually add tags and assist with categorization of posts and reviews, making voluntary organization unsustainable. Such a document set is an excellent use-case for TF-IDF, because it can generate tags for the blog posts and help us display them in the right areas of our site. Best of all, no intern would have to suffer through manually tagging them on their own! A quick run of the algorithm would go through the document set and sort through all the entries, eliminating a great deal of hassle.

![](https://i.ytimg.com/vi/nHnML6fauDg/hqdefault.jpg)

TF-IDF computes a weight which represents the importance of a term inside a document. It does this by comparing the frequency of usage inside an individual document as opposed to the entire data set (a collection of documents).

The importance increases proportionally to the number of times a word appears in the individual document itself--this is called Term Frequency. However, if multiple documents contain the same word many times then you run into a problem. That's why TF-IDF also offsets this value by the frequency of the term in the entire document set, a value called Inverse Document Frequency.

TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)
IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
Value = TF * IDF
TF-IDF is computed for each term in each document. Typically, you will be interested either in one term in particular (like a search engine), or you would be interested in the terms with the highest TF-IDF in a specific document (such as generating tags for blog posts).



### Frequency of Words Plotted by Rank
![](https://www.federalreserve.gov/econresdata/notes/feds-notes/2015/gifjpg/meade-acosta-chart4-20150925.png)