

Welcome to the CV Parsing Project! This notebook is designed to help you parse CVs with two different methods: by sending the file as a base64 string or by providing a public download link. The notebook supports a wide range of file extensions, including PDF, DOC, DOCX, JPG, JPEG, and PNG.


### Key Points
- **File Upload Methods**:
  - **Base64 String**: Encode the file to base64 and send it to the parsing function.
  - **Public Download Link**: Provide a publicly accessible URL from which the file can be downloaded.
- **Supported File Extensions**:
  - **Documents**: PDF, DOC, DOCX
  - **Images**: JPG, JPEG, PNG

### Follow these steps to obtain credentials for using an API from RapidAPI and subscribing to it.

#### Step 1: Create a RapidAPI Account
1. **Visit RapidAPI Website**: Go to [RapidAPI](https://rapidapi.com/).
2. **Sign Up/Log In**:
    - If you don't have an account, sign up using your email, Google, or GitHub account.
    - If you already have an account, log in.

#### Step 2: Find the API
1. **Open the API through this link**: https://rapidapi.com/parsing-ai-parsing-ai-default/api/cv-parser

#### Step 3: Subscribe to the API
1. **Pricing**: Review the pricing plans available for the API. Some APIs offer free tiers, while others may require a paid subscription.
2. **Subscribe**: Click on the subscription plan that fits your needs. You may need to provide payment information if you choose a paid plan.

#### Step 4: Obtain API Credentials
1. **Navigate to 'Endpoints'**: Once subscribed, go to the 'Endpoints' tab on the API's details page.
2. **Get Credentials**: You will find your unique API key or credentials that you will use to authenticate your requests. This is usually found in the 'Request Headers' or 'Authentication' section.

#### Step 5: Use the API Key
1. **Integrate API Key**: Use the API key in your application by including it in the headers of your API requests. Typically, the API key needs to be passed as follows:
    ```http
    x-rapidapi-key: YOUR_API_KEY
    ```


### Notes
1. Make sure that the file is accessible and the link is correct.
2. The base64 string should be properly encoded.
3. This project is initiated by Odlica company: https://odlica.com/.

In [None]:
#your rapid api key should be here
%env rapid_key=

#parse CV using local path: ["pdf", "docx", "doc","png", "jpeg", "jpg"]

## Image: ["png", "jpeg", "jpg"]

In [None]:
#example of image extension
import requests
import base64
import os
import json

cv_path = "/CV/image/path/.jpg"

with open(cv_path, 'rb') as image_file:
    base64_cv = base64.b64encode(image_file.read()).decode()

url = "https://cv-parser.p.rapidapi.com/parse"

# Define the payload
payload = {
    "data_bytes": base64_cv,
    "isbytes": True,
    "file_type" : "jpg" #["pdf", "docx", "doc","png", "jpeg", "jpg"]
}

headers = {
	"x-rapidapi-key": os.getenv("rapid_key"),
	"x-rapidapi-host": "cv-parser.p.rapidapi.com",
	"Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())



## ["pdf", "docx", "doc"]

In [None]:
import requests
import base64
import os

cv_path = "/CV/PDF/path/.pdf"   #example of pdf extension
# cv_path = "/CV/Docx/path/.docx" #example of docx extension

with open(cv_path, 'rb') as f:
    pdf = f.read()

base64_cv = base64.b64encode(pdf).decode()

url = "https://cv-parser.p.rapidapi.com/parse"

# Define the payload
payload = {
    "data_bytes": base64_cv,
    "isbytes": True,
    "file_type" : "pdf" #["pdf", "docx", "doc","png", "jpeg", "jpg"] pdf by default
}

headers = {
	"x-rapidapi-key": os.getenv("rapid_key"),
	"x-rapidapi-host": "cv-parser.p.rapidapi.com",
	"Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())


#parse CV using public link

In [None]:
import requests
import json
import os


drive_id = "1M..."

url = "https://cv-parser.p.rapidapi.com/parse"

payload = {
	#the download link could be for any file ["pdf", "docx", "doc","png", "jpeg", "jpg"]


 # normal download link from drive
	"file_url": f"https://drive.google.com/uc?export=download&id={drive_id}",

	 #export link for docx file as pdf
	# "file_url": f"https://docs.google.com/document/d/{drive_id}/export?format=pdf",

	"isbytes": False,
	"file_type" : "pdf" #["pdf", "docx", "doc","png", "jpeg", "jpg"] pdf by default
}
headers = {
	"x-rapidapi-key": os.getenv("rapid_key"),
	"x-rapidapi-host": "cv-parser.p.rapidapi.com",
	"Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())
