# Introduction to Textract - Free Cloud University

**Goal** The purpose of this lab is to expose you to interact with Textract using Python. By the end of this lab, you should be able to:
* Extract text from an image uploaded to S3 using Textract

Before you begin, ensure that you have Python 3 installed by running the code block below.

In [None]:
!python3 --version

Run this code block to ensure that the boto3 library is installed.

In [None]:
!pip install boto3

Run this code block to import the necessary packages. 

In [None]:
#This module is necessary for interacting with AWS
import boto3

#We will be using these modules to create fake data
from math import sin
import matplotlib.pyplot as plt

Running the code block below will set the region the boto3 library will create the resources.

In [None]:
#We will be using US-East-1 as the default region
%env AWS_DEFAULT_REGION =us-east-1

Next, we will create a PNG file with the matplotlib for wor

In [None]:
x, y = [], []
for i in range(0, 200):
    x.append(i)
    y.append(sin(i))

plt.plot(x, y)
plt.title('Relationship Between X and Sin of X')
plt.xlabel("X Values")
plt.ylabel("Sin of X")
plt.savefig('foo.png')

Finally, before we can interact with all of the AWS resources, we need to create the boto3 clients. 

**Try It Out Yourself**: Fill in the missing `None` values with your AWS credentials and run the code block below.

In [None]:
creds = {
    'aws_access_key_id' : None,
    'aws_access_secret_key' : None
}

textract = boto3.client(
    'textract',
    aws_access_key_id=creds['aws_access_key_id'],
    aws_secret_access_key=creds['aws_access_secret_key']
)

s3 = boto3.resource(
    's3',
    aws_access_key_id=creds['aws_access_key_id'],
    aws_secret_access_key=creds['aws_access_secret_key']
)

Before we continue with the lab, create an S3 Bucket and upload the image created above by filling in the missing `None` values and running the code block below.

In [None]:
#First, let's create an S3 bucket upload the CSV file
bucket = s3.Bucket(None)
bucket.create()

filename = 'foo.png'
with open(filename, 'rb') as data:
    bucket.upload_fileobj(data, 'foo.png')

In order to extract the text of a file using Textract, use the `analyze_document` method. You need to set two parameters: the `Document` parameter, which contains the information about the file you want to extra

In [None]:
response = textract.analyze_document(
    Document={
        'S3Object': {
            'Bucket': None,
            'Name': None
        }
    },
    FeatureTypes=[
        None
    ]
)