# Prompt to Extract Key-values into JSON from W2 (PDF) using advanced mode

Below it's an example of using AnyParser to extract key-values from a W2 PDF into JSON format. (Note: the model is still in beta and is NOT robust enough to generate the same output. Please bear with it!)

### 1. Load the libraries

If you have install `any_parser`, uncomment the below line.

In [21]:
# !pip3 install python-dotenv
# !pip3 install --upgrade any-parser

In [22]:
import os
import pandas as pd

from dotenv import load_dotenv
from any_parser import AnyParser


### 2. Set up your AnyParser API key

To set up your `CAMBIO_API_KEY` API key, you will:

1. create a `.env` file in your root folder;
2. add the following one line to your `.env file:
    ```
    CAMBIO_API_KEY=17b************************
    ```

Then run the below line to load your API key.

In [23]:
load_dotenv(override=True)
example_apikey = os.getenv("CAMBIO_API_KEY")

### 3. Load sample data and Run AnyParser

AnyParser supports both image and PDF.  First let's load a sample data to test AnyParser's capabilities.

Now we can run AnyParser on our sample data and then display it in the Markdown format.

In [24]:
example_local_file = "./sample_data/test1.pdf"
example_prompt = "Return table in a JSON format with each box's key and value."

op = AnyParser(example_apikey)
# mode can be "basic" or "advanced"
qa_result = op.parse(example_local_file, example_prompt, mode="advanced")


Upload response: 204
Extraction success.


In [25]:
qa_result

[[{"a Employee's social security number": '758-58-5787',
   'b Employer identification number (EIN)': '78-8778788',
   "c Employer's name, address, and ZIP code": 'DesignNext\nKatham Dorbosto, Kashiani, Gopalganj\nGopalganj, AK 8133',
   'd Control number': '9',
   "e Employee's first name and initial": 'Jesan',
   'e Last name': 'Rahaman',
   "f Employee's address and ZIP code": 'AL\n877878878',
   '1 Wages, tips, other compensation': '80000.00',
   '2 Federal income tax withheld': '10368.00',
   '3 Social security wages': '80000.00',
   '4 Social security tax withheld': '4960.00',
   '5 Medicare wages and tips': '80000.00',
   '6 Medicare tax withheld': '1160.00',
   '7 Social security tips': 'NA',
   '8 Allocated tips': 'NA',
   '10 Dependent care benefits': 'NA',
   '11 Nonqualified plans': 'NA',
   '13 Statutory Retroment employee Third-party sick pay plan': 'NA',
   '14 Other': 'NA',
   '15 State': 'AL',
   '16 State wages, tips, etc.': '80000.00',
   '17 State income tax': '3835

In [27]:
data = qa_result[0]
keys = [list(item.keys()) for item in data][0]
values = [list(item.values()) for item in data][0]

# Create a DataFrame
df = pd.DataFrame(values, index=keys, columns=['Value'])

df

Unnamed: 0,Value
a Employee's social security number,758-58-5787
b Employer identification number (EIN),78-8778788
"c Employer's name, address, and ZIP code","DesignNext\nKatham Dorbosto, Kashiani, Gopalga..."
d Control number,9
e Employee's first name and initial,Jesan
e Last name,Rahaman
f Employee's address and ZIP code,AL\n877878878
"1 Wages, tips, other compensation",80000.00
2 Federal income tax withheld,10368.00
3 Social security wages,80000.00


## End of the notebook

Check more [case studies](https://www.cambioml.com/blog) of CambioML!

<a href="https://www.cambioml.com/" title="Title">
    <img src="./sample_data/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>