diff --git a/README.md b/README.md new file mode 100644 index 0000000..1a6c006 --- /dev/null +++ b/README.md @@ -0,0 +1 @@ +Notebooks from [docs.unstructured.io/examplecode/notebooks](https://docs.unstructured.io/examplecode/notebooks) \ No newline at end of file diff --git a/notebooks/Unstructured_data_ETL_from_S3_to_SingleStore.ipynb b/notebooks/Unstructured_data_ETL_from_S3_to_SingleStore.ipynb new file mode 100644 index 0000000..cbffcba --- /dev/null +++ b/notebooks/Unstructured_data_ETL_from_S3_to_SingleStore.ipynb @@ -0,0 +1,1126 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Transforming Unstructured Data from an AWS S3 bucket into RAG-Ready Data in SingleStore DB" + ], + "metadata": { + "id": "sGT7Sjb93Zt6" + }, + "id": "sGT7Sjb93Zt6" + }, + { + "metadata": { + "id": "138e15d87f040227" + }, + "cell_type": "markdown", + "source": [ + "In this quick tutorial we'll ingest PDFs from an S3 bucket, transform them into a normalized JSON with Unstructured, which we will then chunk, embed and load into SingleStore DB.\n", + "\n", + "**This notebook was designed to run a local SingleStoreDB container. Download the notebook to use as is on your machine.**\n", + "\n", + "Prerequisites:\n", + "\n", + "A. Get your [Unstructured Serverless API key](https://www.google.com/url?q=https%3A%2F%2Funstructured.io%2Fapi-key-hosted). It comes with a 14-day trial, and a cap of 1000 pages/day.\n", + "\n", + "B. Create an AWS S3 bucket, and populate it with PDFs of choice. Make sure to note down your credentials.\n", + "\n", + "C. Install Docker to use [SingleStore DB Dev Container](https://github.com/singlestore-labs/singlestoredb-dev-image).\n", + "\n", + "D. Install the necessary libraries:" + ], + "id": "138e15d87f040227" + }, + { + "cell_type": "code", + "id": "initial_id", + "metadata": { + "collapsed": true, + "ExecuteTime": { + "end_time": "2024-07-03T13:42:15.075048Z", + "start_time": "2024-07-03T13:41:09.778849Z" + }, + "id": "initial_id", + "outputId": "0713b2f1-b808-45ad-fdf8-e4f10cba2189" + }, + "source": [ + "!pip install -q -U \"unstructured[s3, pdf, singlestore, embed-huggingface]\"" + ], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mWARNING: Skipping unstructured as it is not installed.\u001b[0m\u001b[33m\r\n", + "\u001b[0mCollecting unstructured[embed-huggingface,pdf,s3]\r\n", + " Obtaining dependency information for unstructured[embed-huggingface,pdf,s3] from https://files.pythonhosted.org/packages/62/e2/4356f12efd277fac39e80dfe1e00c9e9798ea9ebb6159acb0ec6f5af938b/unstructured-0.14.9-py3-none-any.whl.metadata\r\n", + " Downloading unstructured-0.14.9-py3-none-any.whl.metadata (28 kB)\r\n", + "Collecting chardet (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for chardet from https://files.pythonhosted.org/packages/38/6f/f5fbc992a329ee4e0f288c1fe0e2ad9485ed064cac731ed2fe47dcc38cbf/chardet-5.2.0-py3-none-any.whl.metadata\r\n", + " Using cached chardet-5.2.0-py3-none-any.whl.metadata (3.4 kB)\r\n", + "Collecting filetype (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for filetype from https://files.pythonhosted.org/packages/18/79/1b8fa1bb3568781e84c9200f951c735f3f157429f44be0495da55894d620/filetype-1.2.0-py2.py3-none-any.whl.metadata\r\n", + " Using cached filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)\r\n", + "Collecting python-magic (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for python-magic from https://files.pythonhosted.org/packages/6c/73/9f872cb81fc5c3bb48f7227872c28975f998f3e7c2b1c16e95e6432bbb90/python_magic-0.4.27-py2.py3-none-any.whl.metadata\r\n", + " Using cached python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)\r\n", + "Collecting lxml (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for lxml from https://files.pythonhosted.org/packages/9b/e9/73c7e6f9a933ee82cd68599d6291c875379cbce2c47717b811744cfd2256/lxml-5.2.2-cp310-cp310-macosx_10_9_universal2.whl.metadata\r\n", + " Using cached lxml-5.2.2-cp310-cp310-macosx_10_9_universal2.whl.metadata (3.4 kB)\r\n", + "Collecting nltk (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for nltk from https://files.pythonhosted.org/packages/a6/0a/0d20d2c0f16be91b9fa32a77b76c60f9baf6eba419e5ef5deca17af9c582/nltk-3.8.1-py3-none-any.whl.metadata\r\n", + " Using cached nltk-3.8.1-py3-none-any.whl.metadata (2.8 kB)\r\n", + "Collecting tabulate (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for tabulate from https://files.pythonhosted.org/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl.metadata\r\n", + " Using cached tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)\r\n", + "Requirement already satisfied: requests in ./.venv/lib/python3.10/site-packages (from unstructured[embed-huggingface,pdf,s3]) (2.32.3)\r\n", + "Requirement already satisfied: beautifulsoup4 in ./.venv/lib/python3.10/site-packages (from unstructured[embed-huggingface,pdf,s3]) (4.12.3)\r\n", + "Collecting emoji (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for emoji from https://files.pythonhosted.org/packages/e6/90/20ad30babfa8f2b5ab46281d8e17bdfdbb3ac294cda14d525b9c2d958846/emoji-2.12.1-py3-none-any.whl.metadata\r\n", + " Using cached emoji-2.12.1-py3-none-any.whl.metadata (5.4 kB)\r\n", + "Collecting dataclasses-json (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for dataclasses-json from https://files.pythonhosted.org/packages/c3/be/d0d44e092656fe7a06b55e6103cbce807cdbdee17884a5367c68c9860853/dataclasses_json-0.6.7-py3-none-any.whl.metadata\r\n", + " Using cached dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)\r\n", + "Collecting python-iso639 (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for python-iso639 from https://files.pythonhosted.org/packages/01/08/5e649cf18dec750d498c53c6c8eb1d9790752ebd50fa7f7e69cc0c277cfe/python_iso639-2024.4.27-py3-none-any.whl.metadata\r\n", + " Using cached python_iso639-2024.4.27-py3-none-any.whl.metadata (13 kB)\r\n", + "Collecting langdetect (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Using cached langdetect-1.0.9-py3-none-any.whl\r\n", + "Collecting numpy<2 (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for numpy<2 from https://files.pythonhosted.org/packages/20/f7/b24208eba89f9d1b58c1668bc6c8c4fd472b20c45573cb767f59d49fb0f6/numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (61 kB)\r\n", + "Collecting rapidfuzz (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for rapidfuzz from https://files.pythonhosted.org/packages/39/57/948ea4d84a5c80caa0aebc443a13d094510c068b9a128904da79a1866a9d/rapidfuzz-3.9.4-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Downloading rapidfuzz-3.9.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (12 kB)\r\n", + "Collecting backoff (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for backoff from https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl.metadata\r\n", + " Using cached backoff-2.2.1-py3-none-any.whl.metadata (14 kB)\r\n", + "Requirement already satisfied: typing-extensions in ./.venv/lib/python3.10/site-packages (from unstructured[embed-huggingface,pdf,s3]) (4.12.2)\r\n", + "Collecting unstructured-client (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for unstructured-client from https://files.pythonhosted.org/packages/f2/24/c429bc0db63563af3ed1c0e07e2a14d935f834ffd2e20e3d58e64cea1625/unstructured_client-0.23.8-py3-none-any.whl.metadata\r\n", + " Downloading unstructured_client-0.23.8-py3-none-any.whl.metadata (12 kB)\r\n", + "Collecting wrapt (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for wrapt from https://files.pythonhosted.org/packages/32/12/e11adfde33444986135d8881b401e4de6cbb4cced046edc6b464e6ad7547/wrapt-1.16.0-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached wrapt-1.16.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)\r\n", + "Collecting tqdm (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for tqdm from https://files.pythonhosted.org/packages/18/eb/fdb7eb9e48b7b02554e1664afd3bd3f117f6b6d6c5881438a0b055554f9b/tqdm-4.66.4-py3-none-any.whl.metadata\r\n", + " Using cached tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)\r\n", + "Collecting huggingface (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for huggingface from https://files.pythonhosted.org/packages/f4/8c/e61fbc39c0a37140e1d4941c4af29e2d53bacf9f4559e3de24d8f4e484f0/huggingface-0.0.1-py3-none-any.whl.metadata\r\n", + " Using cached huggingface-0.0.1-py3-none-any.whl.metadata (2.9 kB)\r\n", + "Collecting langchain-community (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for langchain-community from https://files.pythonhosted.org/packages/be/34/9e064763f81811257667a2a6fc83a3b3e679b62a38fdb6a996c6661c7cd2/langchain_community-0.2.6-py3-none-any.whl.metadata\r\n", + " Downloading langchain_community-0.2.6-py3-none-any.whl.metadata (2.5 kB)\r\n", + "Collecting sentence-transformers (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for sentence-transformers from https://files.pythonhosted.org/packages/58/4b/922436953394e1bfda05e4bf1fe0e80f609770f256c59a9df7a9254f3e0d/sentence_transformers-3.0.1-py3-none-any.whl.metadata\r\n", + " Using cached sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)\r\n", + "Collecting onnx (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for onnx from https://files.pythonhosted.org/packages/7d/bf/810fe3215735ff55a2b65d0430ba9782b70916d67554d9c2c58cebeace45/onnx-1.16.1-cp310-cp310-macosx_11_0_universal2.whl.metadata\r\n", + " Using cached onnx-1.16.1-cp310-cp310-macosx_11_0_universal2.whl.metadata (16 kB)\r\n", + "Collecting pdf2image (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pdf2image from https://files.pythonhosted.org/packages/62/33/61766ae033518957f877ab246f87ca30a85b778ebaad65b7f74fa7e52988/pdf2image-1.17.0-py3-none-any.whl.metadata\r\n", + " Using cached pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)\r\n", + "Collecting pdfminer.six (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pdfminer.six from https://files.pythonhosted.org/packages/eb/9c/e46fe7502b32d7db6af6e36a9105abb93301fa1ec475b5ddcba8b35ae23a/pdfminer.six-20231228-py3-none-any.whl.metadata\r\n", + " Using cached pdfminer.six-20231228-py3-none-any.whl.metadata (4.2 kB)\r\n", + "Collecting pikepdf (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pikepdf from https://files.pythonhosted.org/packages/53/a9/672096a2ce650320fae7dc003c4140673d316ecce42bb2717dc73932a503/pikepdf-9.0.0-cp310-cp310-macosx_14_0_arm64.whl.metadata\r\n", + " Using cached pikepdf-9.0.0-cp310-cp310-macosx_14_0_arm64.whl.metadata (8.5 kB)\r\n", + "Collecting pillow-heif (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pillow-heif from https://files.pythonhosted.org/packages/fd/88/47254fab513e08d028a6c886c6face7d5328df02b9448a0222bf836fc26b/pillow_heif-0.17.0-cp310-cp310-macosx_14_0_arm64.whl.metadata\r\n", + " Downloading pillow_heif-0.17.0-cp310-cp310-macosx_14_0_arm64.whl.metadata (9.9 kB)\r\n", + "Collecting pypdf (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pypdf from https://files.pythonhosted.org/packages/c9/d1/450b19bbdbb2c802f554312c62ce2a2c0d8744fe14735bc70ad2803578c7/pypdf-4.2.0-py3-none-any.whl.metadata\r\n", + " Using cached pypdf-4.2.0-py3-none-any.whl.metadata (7.4 kB)\r\n", + "Collecting pytesseract (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pytesseract from https://files.pythonhosted.org/packages/c5/54/ec007336f38d2d4ce61f3544af3e6855dacbf04a1ac8294f10cabe81146f/pytesseract-0.3.10-py3-none-any.whl.metadata\r\n", + " Using cached pytesseract-0.3.10-py3-none-any.whl.metadata (11 kB)\r\n", + "Collecting google-cloud-vision (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for google-cloud-vision from https://files.pythonhosted.org/packages/04/fc/1e2dfc127d6178b2aed6066bd5b556f4c9dd126d027e0d9855d8078cdeef/google_cloud_vision-3.7.2-py2.py3-none-any.whl.metadata\r\n", + " Using cached google_cloud_vision-3.7.2-py2.py3-none-any.whl.metadata (5.2 kB)\r\n", + "Collecting effdet (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for effdet from https://files.pythonhosted.org/packages/9c/13/563119fe0af82aca5a3b89399c435953072c39515c2e818eb82793955c3b/effdet-0.4.1-py3-none-any.whl.metadata\r\n", + " Using cached effdet-0.4.1-py3-none-any.whl.metadata (33 kB)\r\n", + "Collecting unstructured-inference==0.7.36 (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for unstructured-inference==0.7.36 from https://files.pythonhosted.org/packages/99/45/9bf0a37f38a0856c3d318a21aeb27255e5e9fd5d4ce5345c910442832e32/unstructured_inference-0.7.36-py3-none-any.whl.metadata\r\n", + " Downloading unstructured_inference-0.7.36-py3-none-any.whl.metadata (5.9 kB)\r\n", + "Collecting unstructured.pytesseract>=0.3.12 (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for unstructured.pytesseract>=0.3.12 from https://files.pythonhosted.org/packages/c5/83/4554641f47672fe915be03101cf1c41ab8a3d373518b3240deb8e3a9e7ac/unstructured.pytesseract-0.3.12-py3-none-any.whl.metadata\r\n", + " Using cached unstructured.pytesseract-0.3.12-py3-none-any.whl.metadata (11 kB)\r\n", + "Collecting s3fs (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for s3fs from https://files.pythonhosted.org/packages/d6/00/f933f6e3d669d4532bc88ada9508e5b873d8768cb54e16563b13f4e968a7/s3fs-2024.6.1-py3-none-any.whl.metadata\r\n", + " Downloading s3fs-2024.6.1-py3-none-any.whl.metadata (1.6 kB)\r\n", + "Collecting fsspec (from unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for fsspec from https://files.pythonhosted.org/packages/5e/44/73bea497ac69bafde2ee4269292fa3b41f1198f4bb7bbaaabde30ad29d4a/fsspec-2024.6.1-py3-none-any.whl.metadata\r\n", + " Downloading fsspec-2024.6.1-py3-none-any.whl.metadata (11 kB)\r\n", + "Collecting layoutparser (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for layoutparser from https://files.pythonhosted.org/packages/08/cf/0bfbea1b2ace91af45e15bdec885e05992dc9150907a8398b3d305eddfd2/layoutparser-0.3.4-py3-none-any.whl.metadata\r\n", + " Using cached layoutparser-0.3.4-py3-none-any.whl.metadata (7.7 kB)\r\n", + "Collecting python-multipart (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for python-multipart from https://files.pythonhosted.org/packages/3d/47/444768600d9e0ebc82f8e347775d24aef8f6348cf00e9fa0e81910814e6d/python_multipart-0.0.9-py3-none-any.whl.metadata\r\n", + " Using cached python_multipart-0.0.9-py3-none-any.whl.metadata (2.5 kB)\r\n", + "Collecting huggingface-hub (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for huggingface-hub from https://files.pythonhosted.org/packages/69/d6/73f9d1b7c4da5f0544bc17680d0fa9932445423b90cd38e1ee77d001a4f5/huggingface_hub-0.23.4-py3-none-any.whl.metadata\r\n", + " Using cached huggingface_hub-0.23.4-py3-none-any.whl.metadata (12 kB)\r\n", + "Collecting opencv-python!=4.7.0.68 (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for opencv-python!=4.7.0.68 from https://files.pythonhosted.org/packages/66/82/564168a349148298aca281e342551404ef5521f33fba17b388ead0a84dc5/opencv_python-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached opencv_python-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl.metadata (20 kB)\r\n", + "Collecting onnxruntime>=1.17.0 (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for onnxruntime>=1.17.0 from https://files.pythonhosted.org/packages/02/83/b72ef2d6cc8f8b4d60bc6b41641eaa8975c5f968a49bc69ff3c5e9b28b7f/onnxruntime-1.18.1-cp310-cp310-macosx_11_0_universal2.whl.metadata\r\n", + " Downloading onnxruntime-1.18.1-cp310-cp310-macosx_11_0_universal2.whl.metadata (4.3 kB)\r\n", + "Collecting matplotlib (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for matplotlib from https://files.pythonhosted.org/packages/f7/1f/a0f1a692af13b85335a9d7bd226fc0cae8d0062f1fb940980bc9b38d3b5c/matplotlib-3.9.0-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached matplotlib-3.9.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (11 kB)\r\n", + "Collecting torch (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for torch from https://files.pythonhosted.org/packages/2c/52/7ab0a00b54aa1651e79a9ebc721d45fba86d8c8ab65c4ec6e0a49f09527a/torch-2.3.1-cp310-none-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached torch-2.3.1-cp310-none-macosx_11_0_arm64.whl.metadata (26 kB)\r\n", + "Collecting timm (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for timm from https://files.pythonhosted.org/packages/fe/85/834c70052b518bb8fca457d8ea4e60a65d9cc41f77fd409eeff3b3041638/timm-1.0.7-py3-none-any.whl.metadata\r\n", + " Using cached timm-1.0.7-py3-none-any.whl.metadata (47 kB)\r\n", + "Collecting transformers>=4.25.1 (from unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for transformers>=4.25.1 from https://files.pythonhosted.org/packages/20/5c/244db59e074e80248fdfa60495eeee257e4d97c3df3487df68be30cd60c8/transformers-4.42.3-py3-none-any.whl.metadata\r\n", + " Downloading transformers-4.42.3-py3-none-any.whl.metadata (43 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.6/43.6 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hRequirement already satisfied: packaging>=21.3 in ./.venv/lib/python3.10/site-packages (from unstructured.pytesseract>=0.3.12->unstructured[embed-huggingface,pdf,s3]) (24.1)\r\n", + "Collecting Pillow>=8.0.0 (from unstructured.pytesseract>=0.3.12->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for Pillow>=8.0.0 from https://files.pythonhosted.org/packages/9a/9e/4143b907be8ea0bce215f2ae4f7480027473f8b61fcedfda9d851082a5d2/pillow-10.4.0-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Downloading pillow-10.4.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.2 kB)\r\n", + "Requirement already satisfied: soupsieve>1.2 in ./.venv/lib/python3.10/site-packages (from beautifulsoup4->unstructured[embed-huggingface,pdf,s3]) (2.5)\r\n", + "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for marshmallow<4.0.0,>=3.18.0 from https://files.pythonhosted.org/packages/96/d7/f318261e6ccbba86bdf626e07cd850981508fdaec52cfcdc4ac1030327ab/marshmallow-3.21.3-py3-none-any.whl.metadata\r\n", + " Using cached marshmallow-3.21.3-py3-none-any.whl.metadata (7.1 kB)\r\n", + "Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for typing-inspect<1,>=0.4.0 from https://files.pythonhosted.org/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl.metadata\r\n", + " Using cached typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\r\n", + "Collecting torchvision (from effdet->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for torchvision from https://files.pythonhosted.org/packages/f2/31/867be50508348030afea933e859bd7bbeb86924a6c2e35faf7777fbd6f55/torchvision-0.18.1-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached torchvision-0.18.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)\r\n", + "Collecting pycocotools>=2.0.2 (from effdet->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pycocotools>=2.0.2 from https://files.pythonhosted.org/packages/e1/03/8738c457ca04aed97f79781827b20862e78262da7ccc8062bcc6d6e857e2/pycocotools-2.0.8-cp310-cp310-macosx_10_9_universal2.whl.metadata\r\n", + " Using cached pycocotools-2.0.8-cp310-cp310-macosx_10_9_universal2.whl.metadata (1.1 kB)\r\n", + "Collecting omegaconf>=2.0 (from effdet->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for omegaconf>=2.0 from https://files.pythonhosted.org/packages/e3/94/1843518e420fa3ed6919835845df698c7e27e183cb997394e4a670973a65/omegaconf-2.3.0-py3-none-any.whl.metadata\r\n", + " Using cached omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)\r\n", + "Collecting google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1 (from google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1 from https://files.pythonhosted.org/packages/44/99/daa3541e8ecd7d8b7907b714ba92126097a976b5b3dbabdb5febdcf08554/google_api_core-2.19.1-py3-none-any.whl.metadata\r\n", + " Downloading google_api_core-2.19.1-py3-none-any.whl.metadata (2.7 kB)\r\n", + "Collecting google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1 (from google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1 from https://files.pythonhosted.org/packages/41/de/cc6304d968d816def285be94d54760733574045e86368620ec5b07d83d6a/google_auth-2.31.0-py2.py3-none-any.whl.metadata\r\n", + " Downloading google_auth-2.31.0-py2.py3-none-any.whl.metadata (4.7 kB)\r\n", + "Collecting proto-plus<2.0.0dev,>=1.22.3 (from google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for proto-plus<2.0.0dev,>=1.22.3 from https://files.pythonhosted.org/packages/7c/6f/db31f0711c0402aa477257205ce7d29e86a75cb52cd19f7afb585f75cda0/proto_plus-1.24.0-py3-none-any.whl.metadata\r\n", + " Using cached proto_plus-1.24.0-py3-none-any.whl.metadata (2.2 kB)\r\n", + "Collecting protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 (from google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 from https://files.pythonhosted.org/packages/f3/bf/26deba06a4c910a85f78245cac7698f67cedd7efe00d04f6b3e1b3506a59/protobuf-4.25.3-cp37-abi3-macosx_10_9_universal2.whl.metadata\r\n", + " Using cached protobuf-4.25.3-cp37-abi3-macosx_10_9_universal2.whl.metadata (541 bytes)\r\n", + "Requirement already satisfied: PyYAML>=5.3 in ./.venv/lib/python3.10/site-packages (from langchain-community->unstructured[embed-huggingface,pdf,s3]) (6.0.1)\r\n", + "Collecting SQLAlchemy<3,>=1.4 (from langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for SQLAlchemy<3,>=1.4 from https://files.pythonhosted.org/packages/67/cb/c1d08c7769ccd3c33078c39ea92eda2e6864dbfb6a9e2dfff8b812038972/SQLAlchemy-2.0.31-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached SQLAlchemy-2.0.31-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.6 kB)\r\n", + "Collecting aiohttp<4.0.0,>=3.8.3 (from langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for aiohttp<4.0.0,>=3.8.3 from https://files.pythonhosted.org/packages/a9/51/d95cab6dbee773c57ff590d218633e7b9d52a103bc51060483349f3c8e1e/aiohttp-3.9.5-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached aiohttp-3.9.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (7.5 kB)\r\n", + "Collecting langchain<0.3.0,>=0.2.6 (from langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for langchain<0.3.0,>=0.2.6 from https://files.pythonhosted.org/packages/3f/ae/a5ac2059e8dcc326f8ad4e10d907f6fdf1f920ab060020f12ad64ca26172/langchain-0.2.6-py3-none-any.whl.metadata\r\n", + " Downloading langchain-0.2.6-py3-none-any.whl.metadata (7.0 kB)\r\n", + "Collecting langchain-core<0.3.0,>=0.2.10 (from langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for langchain-core<0.3.0,>=0.2.10 from https://files.pythonhosted.org/packages/00/eb/4c320b83d05533f09f9c9d79cdcaea362b05974f3393f74077673cbd16b2/langchain_core-0.2.11-py3-none-any.whl.metadata\r\n", + " Downloading langchain_core-0.2.11-py3-none-any.whl.metadata (6.0 kB)\r\n", + "Collecting langsmith<0.2.0,>=0.1.0 (from langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for langsmith<0.2.0,>=0.1.0 from https://files.pythonhosted.org/packages/0e/e9/9fb1c3ed03a6470cd6810b46da592e5bfb29a97dc5ea361920e464d3ccb1/langsmith-0.1.83-py3-none-any.whl.metadata\r\n", + " Downloading langsmith-0.1.83-py3-none-any.whl.metadata (13 kB)\r\n", + "Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for tenacity!=8.4.0,<9.0.0,>=8.1.0 from https://files.pythonhosted.org/packages/e3/ee/b179c3ab5cb842d75c65339c4b86b572eaf8f43407890bd1d2c7b72eb829/tenacity-8.4.2-py3-none-any.whl.metadata\r\n", + " Downloading tenacity-8.4.2-py3-none-any.whl.metadata (1.2 kB)\r\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in ./.venv/lib/python3.10/site-packages (from requests->unstructured[embed-huggingface,pdf,s3]) (3.3.2)\r\n", + "Requirement already satisfied: idna<4,>=2.5 in ./.venv/lib/python3.10/site-packages (from requests->unstructured[embed-huggingface,pdf,s3]) (3.7)\r\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in ./.venv/lib/python3.10/site-packages (from requests->unstructured[embed-huggingface,pdf,s3]) (2.2.2)\r\n", + "Requirement already satisfied: certifi>=2017.4.17 in ./.venv/lib/python3.10/site-packages (from requests->unstructured[embed-huggingface,pdf,s3]) (2024.6.2)\r\n", + "Requirement already satisfied: six in ./.venv/lib/python3.10/site-packages (from langdetect->unstructured[embed-huggingface,pdf,s3]) (1.16.0)\r\n", + "Collecting click (from nltk->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for click from https://files.pythonhosted.org/packages/00/2e/d53fa4befbf2cfa713304affc7ca780ce4fc1fd8710527771b58311a3229/click-8.1.7-py3-none-any.whl.metadata\r\n", + " Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)\r\n", + "Collecting joblib (from nltk->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for joblib from https://files.pythonhosted.org/packages/91/29/df4b9b42f2be0b623cbd5e2140cafcaa2bef0759a00b7b70104dcfe2fb51/joblib-1.4.2-py3-none-any.whl.metadata\r\n", + " Using cached joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)\r\n", + "Collecting regex>=2021.8.3 (from nltk->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for regex>=2021.8.3 from https://files.pythonhosted.org/packages/97/d5/f2867ce2b016e2bce4f3d2336dd00bd76743131f586e08f816f05111a737/regex-2024.5.15-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached regex-2024.5.15-cp310-cp310-macosx_11_0_arm64.whl.metadata (40 kB)\r\n", + "Collecting cryptography>=36.0.0 (from pdfminer.six->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for cryptography>=36.0.0 from https://files.pythonhosted.org/packages/60/12/f064af29190cdb1d38fe07f3db6126091639e1dece7ec77c4ff037d49193/cryptography-42.0.8-cp39-abi3-macosx_10_12_universal2.whl.metadata\r\n", + " Using cached cryptography-42.0.8-cp39-abi3-macosx_10_12_universal2.whl.metadata (5.3 kB)\r\n", + "Collecting Deprecated (from pikepdf->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for Deprecated from https://files.pythonhosted.org/packages/20/8d/778b7d51b981a96554f29136cd59ca7880bf58094338085bcf2a979a0e6a/Deprecated-1.2.14-py2.py3-none-any.whl.metadata\r\n", + " Using cached Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)\r\n", + "Collecting aiobotocore<3.0.0,>=2.5.4 (from s3fs->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for aiobotocore<3.0.0,>=2.5.4 from https://files.pythonhosted.org/packages/30/07/42f884c1600169e4267575cdd261c75dea31782d8fd877bbea358d559416/aiobotocore-2.13.1-py3-none-any.whl.metadata\r\n", + " Downloading aiobotocore-2.13.1-py3-none-any.whl.metadata (22 kB)\r\n", + "Collecting scikit-learn (from sentence-transformers->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for scikit-learn from https://files.pythonhosted.org/packages/1f/c6/ba8e5691acca616adc8f0d6f8f5e79d55b927530aa404ee712b077acf0cf/scikit_learn-1.5.1-cp310-cp310-macosx_12_0_arm64.whl.metadata\r\n", + " Downloading scikit_learn-1.5.1-cp310-cp310-macosx_12_0_arm64.whl.metadata (12 kB)\r\n", + "Collecting scipy (from sentence-transformers->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for scipy from https://files.pythonhosted.org/packages/52/21/05a182fb405a53dfbdf6415308bf185677e89188bc2206de011a3653f48e/scipy-1.14.0-cp310-cp310-macosx_14_0_arm64.whl.metadata\r\n", + " Downloading scipy-1.14.0-cp310-cp310-macosx_14_0_arm64.whl.metadata (60 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m60.8/60.8 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hCollecting deepdiff>=6.0 (from unstructured-client->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for deepdiff>=6.0 from https://files.pythonhosted.org/packages/18/e6/d27d37dc55dbf40cdbd665aa52844b065ac760c9a02a02265f97ea7a4256/deepdiff-7.0.1-py3-none-any.whl.metadata\r\n", + " Using cached deepdiff-7.0.1-py3-none-any.whl.metadata (6.8 kB)\r\n", + "Requirement already satisfied: httpx>=0.27.0 in ./.venv/lib/python3.10/site-packages (from unstructured-client->unstructured[embed-huggingface,pdf,s3]) (0.27.0)\r\n", + "Collecting jsonpath-python>=1.0.6 (from unstructured-client->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for jsonpath-python>=1.0.6 from https://files.pythonhosted.org/packages/16/8a/d63959f4eff03893a00e6e63592e3a9f15b9266ed8e0275ab77f8c7dbc94/jsonpath_python-1.0.6-py3-none-any.whl.metadata\r\n", + " Using cached jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)\r\n", + "Collecting mypy-extensions>=1.0.0 (from unstructured-client->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for mypy-extensions>=1.0.0 from https://files.pythonhosted.org/packages/2a/e2/5d3f6ada4297caebe1a2add3b126fe800c96f56dbe5d1988a2cbe0b267aa/mypy_extensions-1.0.0-py3-none-any.whl.metadata\r\n", + " Using cached mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)\r\n", + "Requirement already satisfied: nest-asyncio>=1.6.0 in ./.venv/lib/python3.10/site-packages (from unstructured-client->unstructured[embed-huggingface,pdf,s3]) (1.6.0)\r\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in ./.venv/lib/python3.10/site-packages (from unstructured-client->unstructured[embed-huggingface,pdf,s3]) (2.9.0.post0)\r\n", + "Collecting requests-toolbelt>=1.0.0 (from unstructured-client->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for requests-toolbelt>=1.0.0 from https://files.pythonhosted.org/packages/3f/51/d4db610ef29373b879047326cbf6fa98b6c1969d6f6dc423279de2b1be2c/requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata\r\n", + " Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl.metadata (14 kB)\r\n", + "Collecting botocore<1.34.132,>=1.34.70 (from aiobotocore<3.0.0,>=2.5.4->s3fs->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for botocore<1.34.132,>=1.34.70 from https://files.pythonhosted.org/packages/46/1a/01785fad12a9b1dbeffebd97cd226ea5923114057c64a610dd4eb8a28c7b/botocore-1.34.131-py3-none-any.whl.metadata\r\n", + " Downloading botocore-1.34.131-py3-none-any.whl.metadata (5.7 kB)\r\n", + "Collecting aioitertools<1.0.0,>=0.5.1 (from aiobotocore<3.0.0,>=2.5.4->s3fs->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for aioitertools<1.0.0,>=0.5.1 from https://files.pythonhosted.org/packages/45/66/d1a9fd8e6ff88f2157cb145dd054defb0fd7fe2507fe5a01347e7c690eab/aioitertools-0.11.0-py3-none-any.whl.metadata\r\n", + " Using cached aioitertools-0.11.0-py3-none-any.whl.metadata (3.3 kB)\r\n", + "Collecting aiosignal>=1.1.2 (from aiohttp<4.0.0,>=3.8.3->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for aiosignal>=1.1.2 from https://files.pythonhosted.org/packages/76/ac/a7305707cb852b7e16ff80eaf5692309bde30e2b1100a1fcacdc8f731d97/aiosignal-1.3.1-py3-none-any.whl.metadata\r\n", + " Using cached aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)\r\n", + "Requirement already satisfied: attrs>=17.3.0 in ./.venv/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community->unstructured[embed-huggingface,pdf,s3]) (23.2.0)\r\n", + "Collecting frozenlist>=1.1.1 (from aiohttp<4.0.0,>=3.8.3->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for frozenlist>=1.1.1 from https://files.pythonhosted.org/packages/ae/83/bcdaa437a9bd693ba658a0310f8cdccff26bd78e45fccf8e49897904a5cd/frozenlist-1.4.1-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached frozenlist-1.4.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (12 kB)\r\n", + "Collecting multidict<7.0,>=4.5 (from aiohttp<4.0.0,>=3.8.3->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for multidict<7.0,>=4.5 from https://files.pythonhosted.org/packages/a4/eb/d8e7693c9064554a1585698d1902839440c6c695b0f53c9a8be5d9d4a3b8/multidict-6.0.5-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached multidict-6.0.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (4.2 kB)\r\n", + "Collecting yarl<2.0,>=1.0 (from aiohttp<4.0.0,>=3.8.3->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for yarl<2.0,>=1.0 from https://files.pythonhosted.org/packages/81/c6/06938036ea48fa74521713499fba1459b0eb60af9b9afbe8e0e9e1a96c36/yarl-1.9.4-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached yarl-1.9.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (31 kB)\r\n", + "Collecting async-timeout<5.0,>=4.0 (from aiohttp<4.0.0,>=3.8.3->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for async-timeout<5.0,>=4.0 from https://files.pythonhosted.org/packages/a7/fa/e01228c2938de91d47b307831c62ab9e4001e747789d0b05baf779a6488c/async_timeout-4.0.3-py3-none-any.whl.metadata\r\n", + " Using cached async_timeout-4.0.3-py3-none-any.whl.metadata (4.2 kB)\r\n", + "Requirement already satisfied: cffi>=1.12 in ./.venv/lib/python3.10/site-packages (from cryptography>=36.0.0->pdfminer.six->unstructured[embed-huggingface,pdf,s3]) (1.16.0)\r\n", + "Collecting ordered-set<4.2.0,>=4.1.0 (from deepdiff>=6.0->unstructured-client->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for ordered-set<4.2.0,>=4.1.0 from https://files.pythonhosted.org/packages/33/55/af02708f230eb77084a299d7b08175cff006dea4f2721074b92cdb0296c0/ordered_set-4.1.0-py3-none-any.whl.metadata\r\n", + " Using cached ordered_set-4.1.0-py3-none-any.whl.metadata (5.3 kB)\r\n", + "Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for googleapis-common-protos<2.0.dev0,>=1.56.2 from https://files.pythonhosted.org/packages/02/48/87422ff1bddcae677fb6f58c97f5cfc613304a5e8ce2c3662760199c0a84/googleapis_common_protos-1.63.2-py2.py3-none-any.whl.metadata\r\n", + " Downloading googleapis_common_protos-1.63.2-py2.py3-none-any.whl.metadata (1.5 kB)\r\n", + "Collecting grpcio<2.0dev,>=1.33.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for grpcio<2.0dev,>=1.33.2 from https://files.pythonhosted.org/packages/62/46/2f080ed826b7641220ba1584960f90dd5354d71eb455a1e3e40c0614cd6b/grpcio-1.64.1-cp310-cp310-macosx_12_0_universal2.whl.metadata\r\n", + " Using cached grpcio-1.64.1-cp310-cp310-macosx_12_0_universal2.whl.metadata (3.3 kB)\r\n", + "Collecting grpcio-status<2.0.dev0,>=1.33.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for grpcio-status<2.0.dev0,>=1.33.2 from https://files.pythonhosted.org/packages/be/b3/2f623a3c88381310055ea5ba782853e69e5c8a41853d260d131bc0b50ef7/grpcio_status-1.64.1-py3-none-any.whl.metadata\r\n", + " Using cached grpcio_status-1.64.1-py3-none-any.whl.metadata (1.1 kB)\r\n", + "Collecting cachetools<6.0,>=2.0.0 (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for cachetools<6.0,>=2.0.0 from https://files.pythonhosted.org/packages/fb/2b/a64c2d25a37aeb921fddb929111413049fc5f8b9a4c1aefaffaafe768d54/cachetools-5.3.3-py3-none-any.whl.metadata\r\n", + " Using cached cachetools-5.3.3-py3-none-any.whl.metadata (5.3 kB)\r\n", + "Collecting pyasn1-modules>=0.2.1 (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pyasn1-modules>=0.2.1 from https://files.pythonhosted.org/packages/13/68/8906226b15ef38e71dc926c321d2fe99de8048e9098b5dfd38343011c886/pyasn1_modules-0.4.0-py3-none-any.whl.metadata\r\n", + " Using cached pyasn1_modules-0.4.0-py3-none-any.whl.metadata (3.4 kB)\r\n", + "Collecting rsa<5,>=3.1.4 (from google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for rsa<5,>=3.1.4 from https://files.pythonhosted.org/packages/49/97/fa78e3d2f65c02c8e1268b9aba606569fe97f6c8f7c2d74394553347c145/rsa-4.9-py3-none-any.whl.metadata\r\n", + " Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)\r\n", + "Requirement already satisfied: anyio in ./.venv/lib/python3.10/site-packages (from httpx>=0.27.0->unstructured-client->unstructured[embed-huggingface,pdf,s3]) (4.4.0)\r\n", + "Requirement already satisfied: httpcore==1.* in ./.venv/lib/python3.10/site-packages (from httpx>=0.27.0->unstructured-client->unstructured[embed-huggingface,pdf,s3]) (1.0.5)\r\n", + "Requirement already satisfied: sniffio in ./.venv/lib/python3.10/site-packages (from httpx>=0.27.0->unstructured-client->unstructured[embed-huggingface,pdf,s3]) (1.3.1)\r\n", + "Requirement already satisfied: h11<0.15,>=0.13 in ./.venv/lib/python3.10/site-packages (from httpcore==1.*->httpx>=0.27.0->unstructured-client->unstructured[embed-huggingface,pdf,s3]) (0.14.0)\r\n", + "Collecting filelock (from huggingface-hub->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for filelock from https://files.pythonhosted.org/packages/ae/f0/48285f0262fe47103a4a45972ed2f9b93e4c80b8fd609fa98da78b2a5706/filelock-3.15.4-py3-none-any.whl.metadata\r\n", + " Downloading filelock-3.15.4-py3-none-any.whl.metadata (2.9 kB)\r\n", + "Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain<0.3.0,>=0.2.6->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for langchain-text-splitters<0.3.0,>=0.2.0 from https://files.pythonhosted.org/packages/06/76/9e0ca1b8881f64bf927f2205bf6c43a085c04646a71d911b3c05d76e90bb/langchain_text_splitters-0.2.2-py3-none-any.whl.metadata\r\n", + " Downloading langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)\r\n", + "Collecting pydantic<3,>=1 (from langchain<0.3.0,>=0.2.6->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pydantic<3,>=1 from https://files.pythonhosted.org/packages/d3/07/bbfddb7b532b727a5769a8468a67ab388e74c029d4940e5de6b25231aba2/pydantic-2.8.0-py3-none-any.whl.metadata\r\n", + " Downloading pydantic-2.8.0-py3-none-any.whl.metadata (123 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m123.5/123.5 kB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hCollecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3.0,>=0.2.10->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for jsonpatch<2.0,>=1.33 from https://files.pythonhosted.org/packages/73/07/02e16ed01e04a374e644b575638ec7987ae846d25ad97bcc9945a3ee4b0e/jsonpatch-1.33-py2.py3-none-any.whl.metadata\r\n", + " Using cached jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)\r\n", + "Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.0->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for orjson<4.0.0,>=3.9.14 from https://files.pythonhosted.org/packages/f3/39/780bc1842aefc478ed42ab1dfff49bdd63d7ac27605dc5e69c172378b536/orjson-3.10.6-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl.metadata\r\n", + " Downloading orjson-3.10.6-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl.metadata (50 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.4/50.4 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hCollecting antlr4-python3-runtime==4.9.* (from omegaconf>=2.0->effdet->unstructured[embed-huggingface,pdf,s3])\r\n", + " Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl\r\n", + "Collecting coloredlogs (from onnxruntime>=1.17.0->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for coloredlogs from https://files.pythonhosted.org/packages/a7/06/3d6badcf13db419e25b07041d9c7b4a2c331d3f4e7134445ec5df57714cd/coloredlogs-15.0.1-py2.py3-none-any.whl.metadata\r\n", + " Using cached coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)\r\n", + "Collecting flatbuffers (from onnxruntime>=1.17.0->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for flatbuffers from https://files.pythonhosted.org/packages/41/f0/7e988a019bc54b2dbd0ad4182ef2d53488bb02e58694cd79d61369e85900/flatbuffers-24.3.25-py2.py3-none-any.whl.metadata\r\n", + " Using cached flatbuffers-24.3.25-py2.py3-none-any.whl.metadata (850 bytes)\r\n", + "Collecting sympy (from onnxruntime>=1.17.0->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for sympy from https://files.pythonhosted.org/packages/61/53/e18c8c97d0b2724d85c9830477e3ebea3acf1dcdc6deb344d5d9c93a9946/sympy-1.12.1-py3-none-any.whl.metadata\r\n", + " Using cached sympy-1.12.1-py3-none-any.whl.metadata (12 kB)\r\n", + "Collecting contourpy>=1.0.1 (from matplotlib->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for contourpy>=1.0.1 from https://files.pythonhosted.org/packages/d8/d5/f23beca650c8aab67e72f610d65817c68c306e6f6a124ca337fcec7d5d57/contourpy-1.2.1-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached contourpy-1.2.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.8 kB)\r\n", + "Collecting cycler>=0.10 (from matplotlib->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for cycler>=0.10 from https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl.metadata\r\n", + " Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)\r\n", + "Collecting fonttools>=4.22.0 (from matplotlib->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for fonttools>=4.22.0 from https://files.pythonhosted.org/packages/4a/5d/cf58fe32c9ddc6e3189afd09a43de7e6380043e0edabcbfa9708457a36cf/fonttools-4.53.0-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached fonttools-4.53.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (162 kB)\r\n", + "Collecting kiwisolver>=1.3.1 (from matplotlib->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for kiwisolver>=1.3.1 from https://files.pythonhosted.org/packages/23/11/6fb190bae4b279d712a834e7b1da89f6dcff6791132f7399aa28a57c3565/kiwisolver-1.4.5-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached kiwisolver-1.4.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.4 kB)\r\n", + "Collecting pyparsing>=2.3.1 (from matplotlib->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pyparsing>=2.3.1 from https://files.pythonhosted.org/packages/9d/ea/6d76df31432a0e6fdf81681a895f009a4bb47b3c39036db3e1b528191d52/pyparsing-3.1.2-py3-none-any.whl.metadata\r\n", + " Using cached pyparsing-3.1.2-py3-none-any.whl.metadata (5.1 kB)\r\n", + "Collecting safetensors (from timm->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for safetensors from https://files.pythonhosted.org/packages/70/a8/0e856138b02bc8cd9fdf2cdd7b7a07e5b8a8d15d26a722567cbdfcee5c62/safetensors-0.4.3-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached safetensors-0.4.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.8 kB)\r\n", + "Collecting networkx (from torch->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for networkx from https://files.pythonhosted.org/packages/38/e9/5f72929373e1a0e8d142a130f3f97e6ff920070f87f91c4e13e40e0fba5a/networkx-3.3-py3-none-any.whl.metadata\r\n", + " Using cached networkx-3.3-py3-none-any.whl.metadata (5.1 kB)\r\n", + "Requirement already satisfied: jinja2 in ./.venv/lib/python3.10/site-packages (from torch->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3]) (3.1.4)\r\n", + "Collecting tokenizers<0.20,>=0.19 (from transformers>=4.25.1->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for tokenizers<0.20,>=0.19 from https://files.pythonhosted.org/packages/4c/12/9cb68762ff5fee1efd51aefe2f62cb225f26f060a68a3779e1060bbc7a59/tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.7 kB)\r\n", + "Collecting pandas (from layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pandas from https://files.pythonhosted.org/packages/fd/4b/0cd38e68ab690b9df8ef90cba625bf3f93b82d1c719703b8e1b333b2c72d/pandas-2.2.2-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached pandas-2.2.2-cp310-cp310-macosx_11_0_arm64.whl.metadata (19 kB)\r\n", + "Collecting iopath (from layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Using cached iopath-0.1.10-py3-none-any.whl\r\n", + "Collecting pdfplumber (from layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pdfplumber from https://files.pythonhosted.org/packages/8a/48/c65eea448018b6f55ad262139a32610ac517d2451380e662225c765ada91/pdfplumber-0.11.1-py3-none-any.whl.metadata\r\n", + " Using cached pdfplumber-0.11.1-py3-none-any.whl.metadata (39 kB)\r\n", + "Collecting threadpoolctl>=3.1.0 (from scikit-learn->sentence-transformers->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for threadpoolctl>=3.1.0 from https://files.pythonhosted.org/packages/4b/2c/ffbf7a134b9ab11a67b0cf0726453cedd9c5043a4fe7a35d1cefa9a1bcfb/threadpoolctl-3.5.0-py3-none-any.whl.metadata\r\n", + " Using cached threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)\r\n", + "Collecting jmespath<2.0.0,>=0.7.1 (from botocore<1.34.132,>=1.34.70->aiobotocore<3.0.0,>=2.5.4->s3fs->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for jmespath<2.0.0,>=0.7.1 from https://files.pythonhosted.org/packages/31/b4/b9b800c45527aadd64d5b442f9b932b00648617eb5d63d2c7a6587b7cafc/jmespath-1.0.1-py3-none-any.whl.metadata\r\n", + " Using cached jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)\r\n", + "Requirement already satisfied: pycparser in ./.venv/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six->unstructured[embed-huggingface,pdf,s3]) (2.22)\r\n", + "INFO: pip is looking at multiple versions of grpcio-status to determine which version is compatible with other requirements. This could take a while.\r\n", + "Collecting grpcio-status<2.0.dev0,>=1.33.2 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for grpcio-status<2.0.dev0,>=1.33.2 from https://files.pythonhosted.org/packages/4c/b7/634bee0f33282e03073522ef054ae70ff6bacde417afbf28ef35256cd908/grpcio_status-1.64.0-py3-none-any.whl.metadata\r\n", + " Using cached grpcio_status-1.64.0-py3-none-any.whl.metadata (1.1 kB)\r\n", + " Obtaining dependency information for grpcio-status<2.0.dev0,>=1.33.2 from https://files.pythonhosted.org/packages/ef/22/e67faeb3dbf1271b1a100faeeafada4e362bad32739aeec13dd5c54ebc11/grpcio_status-1.63.0-py3-none-any.whl.metadata\r\n", + " Using cached grpcio_status-1.63.0-py3-none-any.whl.metadata (1.1 kB)\r\n", + " Obtaining dependency information for grpcio-status<2.0.dev0,>=1.33.2 from https://files.pythonhosted.org/packages/33/63/56a8c67a77947d0ebc31a03c5dea8d2c933ab3ad30019b0bafa3e50a5ef6/grpcio_status-1.62.2-py3-none-any.whl.metadata\r\n", + " Using cached grpcio_status-1.62.2-py3-none-any.whl.metadata (1.3 kB)\r\n", + "Requirement already satisfied: jsonpointer>=1.9 in ./.venv/lib/python3.10/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3.0,>=0.2.10->langchain-community->unstructured[embed-huggingface,pdf,s3]) (3.0.0)\r\n", + "Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth!=2.24.0,!=2.25.0,<3.0.0dev,>=2.14.1->google-cloud-vision->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pyasn1<0.7.0,>=0.4.6 from https://files.pythonhosted.org/packages/23/7e/5f50d07d5e70a2addbccd90ac2950f81d1edd0783630651d9268d7f1db49/pyasn1-0.6.0-py2.py3-none-any.whl.metadata\r\n", + " Using cached pyasn1-0.6.0-py2.py3-none-any.whl.metadata (8.3 kB)\r\n", + "Collecting annotated-types>=0.4.0 (from pydantic<3,>=1->langchain<0.3.0,>=0.2.6->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for annotated-types>=0.4.0 from https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl.metadata\r\n", + " Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)\r\n", + "Collecting pydantic-core==2.20.0 (from pydantic<3,>=1->langchain<0.3.0,>=0.2.6->langchain-community->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pydantic-core==2.20.0 from https://files.pythonhosted.org/packages/fd/64/eb15006815b1ab2159a8b406323a40ab3fd0cb431f947c387a5771c62581/pydantic_core-2.20.0-cp310-cp310-macosx_11_0_arm64.whl.metadata\r\n", + " Downloading pydantic_core-2.20.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)\r\n", + "Requirement already satisfied: exceptiongroup>=1.0.2 in ./.venv/lib/python3.10/site-packages (from anyio->httpx>=0.27.0->unstructured-client->unstructured[embed-huggingface,pdf,s3]) (1.2.1)\r\n", + "Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime>=1.17.0->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for humanfriendly>=9.1 from https://files.pythonhosted.org/packages/f0/0f/310fb31e39e2d734ccaa2c0fb981ee41f7bd5056ce9bc29b2248bd569169/humanfriendly-10.0-py2.py3-none-any.whl.metadata\r\n", + " Using cached humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)\r\n", + "Collecting portalocker (from iopath->layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for portalocker from https://files.pythonhosted.org/packages/07/ff/52080172c7fdfa7c62f8cab014997178c19be9948607e977184dafc76522/portalocker-2.10.0-py3-none-any.whl.metadata\r\n", + " Downloading portalocker-2.10.0-py3-none-any.whl.metadata (8.5 kB)\r\n", + "Requirement already satisfied: MarkupSafe>=2.0 in ./.venv/lib/python3.10/site-packages (from jinja2->torch->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3]) (2.1.5)\r\n", + "Collecting pytz>=2020.1 (from pandas->layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pytz>=2020.1 from https://files.pythonhosted.org/packages/9c/3d/a121f284241f08268b21359bd425f7d4825cffc5ac5cd0e1b3d82ffd2b10/pytz-2024.1-py2.py3-none-any.whl.metadata\r\n", + " Using cached pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)\r\n", + "Collecting tzdata>=2022.7 (from pandas->layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for tzdata>=2022.7 from https://files.pythonhosted.org/packages/65/58/f9c9e6be752e9fcb8b6a0ee9fb87e6e7a1f6bcab2cdc73f02bb7ba91ada0/tzdata-2024.1-py2.py3-none-any.whl.metadata\r\n", + " Using cached tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)\r\n", + "Collecting pypdfium2>=4.18.0 (from pdfplumber->layoutparser->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for pypdfium2>=4.18.0 from https://files.pythonhosted.org/packages/21/8b/27d4d5409f3c76b985f4ee4afe147b606594411e15ac4dc1c3363c9a9810/pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl.metadata\r\n", + " Using cached pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl.metadata (48 kB)\r\n", + "Collecting mpmath<1.4.0,>=1.1.0 (from sympy->onnxruntime>=1.17.0->unstructured-inference==0.7.36->unstructured[embed-huggingface,pdf,s3])\r\n", + " Obtaining dependency information for mpmath<1.4.0,>=1.1.0 from https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl.metadata\r\n", + " Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)\r\n", + "Downloading unstructured_inference-0.7.36-py3-none-any.whl (56 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.4/56.4 kB\u001b[0m \u001b[31m3.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hUsing cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl (14.0 MB)\r\n", + "Using cached unstructured.pytesseract-0.3.12-py3-none-any.whl (14 kB)\r\n", + "Using cached backoff-2.2.1-py3-none-any.whl (15 kB)\r\n", + "Using cached chardet-5.2.0-py3-none-any.whl (199 kB)\r\n", + "Using cached dataclasses_json-0.6.7-py3-none-any.whl (28 kB)\r\n", + "Using cached effdet-0.4.1-py3-none-any.whl (112 kB)\r\n", + "Using cached emoji-2.12.1-py3-none-any.whl (431 kB)\r\n", + "Using cached filetype-1.2.0-py2.py3-none-any.whl (19 kB)\r\n", + "Downloading fsspec-2024.6.1-py3-none-any.whl (177 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m177.6/177.6 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached google_cloud_vision-3.7.2-py2.py3-none-any.whl (459 kB)\r\n", + "Using cached huggingface-0.0.1-py3-none-any.whl (2.5 kB)\r\n", + "Downloading langchain_community-0.2.6-py3-none-any.whl (2.2 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.2/2.2 MB\u001b[0m \u001b[31m5.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached lxml-5.2.2-cp310-cp310-macosx_10_9_universal2.whl (8.1 MB)\r\n", + "Using cached nltk-3.8.1-py3-none-any.whl (1.5 MB)\r\n", + "Using cached onnx-1.16.1-cp310-cp310-macosx_11_0_universal2.whl (16.5 MB)\r\n", + "Using cached pdf2image-1.17.0-py3-none-any.whl (11 kB)\r\n", + "Using cached pdfminer.six-20231228-py3-none-any.whl (5.6 MB)\r\n", + "Using cached pikepdf-9.0.0-cp310-cp310-macosx_14_0_arm64.whl (4.4 MB)\r\n", + "Downloading pillow_heif-0.17.0-cp310-cp310-macosx_14_0_arm64.whl (3.7 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.7/3.7 MB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached pypdf-4.2.0-py3-none-any.whl (290 kB)\r\n", + "Using cached pytesseract-0.3.10-py3-none-any.whl (14 kB)\r\n", + "Using cached python_iso639-2024.4.27-py3-none-any.whl (274 kB)\r\n", + "Using cached python_magic-0.4.27-py2.py3-none-any.whl (13 kB)\r\n", + "Downloading rapidfuzz-3.9.4-cp310-cp310-macosx_11_0_arm64.whl (1.5 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hDownloading s3fs-2024.6.1-py3-none-any.whl (29 kB)\r\n", + "Using cached sentence_transformers-3.0.1-py3-none-any.whl (227 kB)\r\n", + "Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)\r\n", + "Using cached tqdm-4.66.4-py3-none-any.whl (78 kB)\r\n", + "Downloading unstructured-0.14.9-py3-none-any.whl (2.1 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hDownloading unstructured_client-0.23.8-py3-none-any.whl (40 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m41.0/41.0 kB\u001b[0m \u001b[31m2.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hUsing cached wrapt-1.16.0-cp310-cp310-macosx_11_0_arm64.whl (38 kB)\r\n", + "Downloading aiobotocore-2.13.1-py3-none-any.whl (76 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.9/76.9 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hUsing cached aiohttp-3.9.5-cp310-cp310-macosx_11_0_arm64.whl (389 kB)\r\n", + "Using cached cryptography-42.0.8-cp39-abi3-macosx_10_12_universal2.whl (5.9 MB)\r\n", + "Using cached deepdiff-7.0.1-py3-none-any.whl (80 kB)\r\n", + "Downloading google_auth-2.31.0-py2.py3-none-any.whl (194 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.6/194.6 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached huggingface_hub-0.23.4-py3-none-any.whl (402 kB)\r\n", + "Using cached jsonpath_python-1.0.6-py3-none-any.whl (7.6 kB)\r\n", + "Downloading langchain-0.2.6-py3-none-any.whl (975 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m975.5/975.5 kB\u001b[0m \u001b[31m5.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hDownloading langchain_core-0.2.11-py3-none-any.whl (337 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m337.4/337.4 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hDownloading langsmith-0.1.83-py3-none-any.whl (127 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.5/127.5 kB\u001b[0m \u001b[31m4.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hUsing cached marshmallow-3.21.3-py3-none-any.whl (49 kB)\r\n", + "Using cached mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\r\n", + "Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)\r\n", + "Downloading onnxruntime-1.18.1-cp310-cp310-macosx_11_0_universal2.whl (15.9 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m15.9/15.9 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached opencv_python-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl (54.8 MB)\r\n", + "Downloading pillow-10.4.0-cp310-cp310-macosx_11_0_arm64.whl (3.4 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached proto_plus-1.24.0-py3-none-any.whl (50 kB)\r\n", + "Using cached protobuf-4.25.3-cp37-abi3-macosx_10_9_universal2.whl (394 kB)\r\n", + "Using cached pycocotools-2.0.8-cp310-cp310-macosx_10_9_universal2.whl (162 kB)\r\n", + "Using cached matplotlib-3.9.0-cp310-cp310-macosx_11_0_arm64.whl (7.8 MB)\r\n", + "Using cached regex-2024.5.15-cp310-cp310-macosx_11_0_arm64.whl (278 kB)\r\n", + "Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)\r\n", + "Using cached SQLAlchemy-2.0.31-cp310-cp310-macosx_11_0_arm64.whl (2.1 MB)\r\n", + "Downloading tenacity-8.4.2-py3-none-any.whl (28 kB)\r\n", + "Using cached timm-1.0.7-py3-none-any.whl (2.3 MB)\r\n", + "Using cached torch-2.3.1-cp310-none-macosx_11_0_arm64.whl (61.0 MB)\r\n", + "Downloading transformers-4.42.3-py3-none-any.whl (9.3 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.3/9.3 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\r\n", + "Using cached click-8.1.7-py3-none-any.whl (97 kB)\r\n", + "Using cached Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)\r\n", + "Using cached joblib-1.4.2-py3-none-any.whl (301 kB)\r\n", + "Using cached layoutparser-0.3.4-py3-none-any.whl (19.2 MB)\r\n", + "Using cached python_multipart-0.0.9-py3-none-any.whl (22 kB)\r\n", + "Downloading scikit_learn-1.5.1-cp310-cp310-macosx_12_0_arm64.whl (11.0 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.0/11.0 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hDownloading scipy-1.14.0-cp310-cp310-macosx_14_0_arm64.whl (23.1 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m23.1/23.1 MB\u001b[0m \u001b[31m5.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached torchvision-0.18.1-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB)\r\n", + "Using cached aioitertools-0.11.0-py3-none-any.whl (23 kB)\r\n", + "Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\r\n", + "Using cached async_timeout-4.0.3-py3-none-any.whl (5.7 kB)\r\n", + "Downloading botocore-1.34.131-py3-none-any.whl (12.3 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.3/12.3 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached cachetools-5.3.3-py3-none-any.whl (9.3 kB)\r\n", + "Using cached contourpy-1.2.1-cp310-cp310-macosx_11_0_arm64.whl (244 kB)\r\n", + "Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)\r\n", + "Using cached fonttools-4.53.0-cp310-cp310-macosx_11_0_arm64.whl (2.2 MB)\r\n", + "Using cached frozenlist-1.4.1-cp310-cp310-macosx_11_0_arm64.whl (52 kB)\r\n", + "Downloading googleapis_common_protos-1.63.2-py2.py3-none-any.whl (220 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m220.0/220.0 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached grpcio-1.64.1-cp310-cp310-macosx_12_0_universal2.whl (10.3 MB)\r\n", + "Using cached grpcio_status-1.62.2-py3-none-any.whl (14 kB)\r\n", + "Using cached jsonpatch-1.33-py2.py3-none-any.whl (12 kB)\r\n", + "Using cached kiwisolver-1.4.5-cp310-cp310-macosx_11_0_arm64.whl (66 kB)\r\n", + "Downloading langchain_text_splitters-0.2.2-py3-none-any.whl (25 kB)\r\n", + "Using cached multidict-6.0.5-cp310-cp310-macosx_11_0_arm64.whl (30 kB)\r\n", + "Using cached ordered_set-4.1.0-py3-none-any.whl (7.6 kB)\r\n", + "Downloading orjson-3.10.6-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl (250 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m250.5/250.5 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached pyasn1_modules-0.4.0-py3-none-any.whl (181 kB)\r\n", + "Downloading pydantic-2.8.0-py3-none-any.whl (423 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m423.1/423.1 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hDownloading pydantic_core-2.20.0-cp310-cp310-macosx_11_0_arm64.whl (1.8 MB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\r\n", + "\u001b[?25hUsing cached pyparsing-3.1.2-py3-none-any.whl (103 kB)\r\n", + "Using cached rsa-4.9-py3-none-any.whl (34 kB)\r\n", + "Using cached safetensors-0.4.3-cp310-cp310-macosx_11_0_arm64.whl (410 kB)\r\n", + "Using cached threadpoolctl-3.5.0-py3-none-any.whl (18 kB)\r\n", + "Using cached tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl (2.4 MB)\r\n", + "Using cached yarl-1.9.4-cp310-cp310-macosx_11_0_arm64.whl (79 kB)\r\n", + "Using cached coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\r\n", + "Downloading filelock-3.15.4-py3-none-any.whl (16 kB)\r\n", + "Using cached flatbuffers-24.3.25-py2.py3-none-any.whl (26 kB)\r\n", + "Downloading google_api_core-2.19.1-py3-none-any.whl (139 kB)\r\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m139.4/139.4 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\r\n", + "\u001b[?25hUsing cached networkx-3.3-py3-none-any.whl (1.7 MB)\r\n", + "Using cached pandas-2.2.2-cp310-cp310-macosx_11_0_arm64.whl (11.3 MB)\r\n", + "Using cached pdfplumber-0.11.1-py3-none-any.whl (57 kB)\r\n", + "Using cached sympy-1.12.1-py3-none-any.whl (5.7 MB)\r\n", + "Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)\r\n", + "Using cached humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\r\n", + "Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)\r\n", + "Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)\r\n", + "Using cached pyasn1-0.6.0-py2.py3-none-any.whl (85 kB)\r\n", + "Using cached pypdfium2-4.30.0-py3-none-macosx_11_0_arm64.whl (2.7 MB)\r\n", + "Using cached pytz-2024.1-py2.py3-none-any.whl (505 kB)\r\n", + "Using cached tzdata-2024.1-py2.py3-none-any.whl (345 kB)\r\n", + "Downloading portalocker-2.10.0-py3-none-any.whl (18 kB)\r\n", + "Installing collected packages: pytz, mpmath, huggingface, flatbuffers, filetype, antlr4-python3-runtime, wrapt, tzdata, tqdm, threadpoolctl, tenacity, tabulate, sympy, SQLAlchemy, safetensors, regex, rapidfuzz, python-multipart, python-magic, python-iso639, pypdfium2, pypdf, pyparsing, pydantic-core, pyasn1, protobuf, portalocker, Pillow, orjson, ordered-set, omegaconf, numpy, networkx, mypy-extensions, multidict, marshmallow, lxml, langdetect, kiwisolver, jsonpath-python, jsonpatch, joblib, jmespath, humanfriendly, grpcio, fsspec, frozenlist, fonttools, filelock, emoji, cycler, click, chardet, cachetools, backoff, async-timeout, annotated-types, aioitertools, yarl, unstructured.pytesseract, typing-inspect, torch, scipy, rsa, requests-toolbelt, pytesseract, pydantic, pyasn1-modules, proto-plus, pillow-heif, pdf2image, pandas, opencv-python, onnx, nltk, iopath, huggingface-hub, googleapis-common-protos, Deprecated, deepdiff, cryptography, contourpy, coloredlogs, botocore, aiosignal, torchvision, tokenizers, scikit-learn, pikepdf, pdfminer.six, onnxruntime, matplotlib, langsmith, grpcio-status, google-auth, dataclasses-json, aiohttp, unstructured-client, transformers, timm, pycocotools, pdfplumber, langchain-core, google-api-core, aiobotocore, unstructured, sentence-transformers, s3fs, layoutparser, langchain-text-splitters, effdet, unstructured-inference, langchain, google-cloud-vision, langchain-community\r\n", + "Successfully installed Deprecated-1.2.14 Pillow-10.4.0 SQLAlchemy-2.0.31 aiobotocore-2.13.1 aiohttp-3.9.5 aioitertools-0.11.0 aiosignal-1.3.1 annotated-types-0.7.0 antlr4-python3-runtime-4.9.3 async-timeout-4.0.3 backoff-2.2.1 botocore-1.34.131 cachetools-5.3.3 chardet-5.2.0 click-8.1.7 coloredlogs-15.0.1 contourpy-1.2.1 cryptography-42.0.8 cycler-0.12.1 dataclasses-json-0.6.7 deepdiff-7.0.1 effdet-0.4.1 emoji-2.12.1 filelock-3.15.4 filetype-1.2.0 flatbuffers-24.3.25 fonttools-4.53.0 frozenlist-1.4.1 fsspec-2024.6.1 google-api-core-2.19.1 google-auth-2.31.0 google-cloud-vision-3.7.2 googleapis-common-protos-1.63.2 grpcio-1.64.1 grpcio-status-1.62.2 huggingface-0.0.1 huggingface-hub-0.23.4 humanfriendly-10.0 iopath-0.1.10 jmespath-1.0.1 joblib-1.4.2 jsonpatch-1.33 jsonpath-python-1.0.6 kiwisolver-1.4.5 langchain-0.2.6 langchain-community-0.2.6 langchain-core-0.2.11 langchain-text-splitters-0.2.2 langdetect-1.0.9 langsmith-0.1.83 layoutparser-0.3.4 lxml-5.2.2 marshmallow-3.21.3 matplotlib-3.9.0 mpmath-1.3.0 multidict-6.0.5 mypy-extensions-1.0.0 networkx-3.3 nltk-3.8.1 numpy-1.26.4 omegaconf-2.3.0 onnx-1.16.1 onnxruntime-1.18.1 opencv-python-4.10.0.84 ordered-set-4.1.0 orjson-3.10.6 pandas-2.2.2 pdf2image-1.17.0 pdfminer.six-20231228 pdfplumber-0.11.1 pikepdf-9.0.0 pillow-heif-0.17.0 portalocker-2.10.0 proto-plus-1.24.0 protobuf-4.25.3 pyasn1-0.6.0 pyasn1-modules-0.4.0 pycocotools-2.0.8 pydantic-2.8.0 pydantic-core-2.20.0 pyparsing-3.1.2 pypdf-4.2.0 pypdfium2-4.30.0 pytesseract-0.3.10 python-iso639-2024.4.27 python-magic-0.4.27 python-multipart-0.0.9 pytz-2024.1 rapidfuzz-3.9.4 regex-2024.5.15 requests-toolbelt-1.0.0 rsa-4.9 s3fs-2024.6.1 safetensors-0.4.3 scikit-learn-1.5.1 scipy-1.14.0 sentence-transformers-3.0.1 sympy-1.12.1 tabulate-0.9.0 tenacity-8.4.2 threadpoolctl-3.5.0 timm-1.0.7 tokenizers-0.19.1 torch-2.3.1 torchvision-0.18.1 tqdm-4.66.4 transformers-4.42.3 typing-inspect-0.9.0 tzdata-2024.1 unstructured-0.14.9 unstructured-client-0.23.8 unstructured-inference-0.7.36 unstructured.pytesseract-0.3.12 wrapt-1.16.0 yarl-1.9.4\r\n", + "\r\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.1\u001b[0m\r\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\r\n" + ] + } + ], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "## Load env variables\n", + "\n", + "In this example we're loading the environment variables with all the secrets from a file in Google Drive. For this, we mount the Google Drive (if you do the same, note that there will be a pop up asking you to connect to your Google drive).\n", + "\n", + "Once the drive is mounted, we load the env variables from a `.env` file. If you have another preferred method for loading env variables, go ahead and use it :)" + ], + "metadata": { + "id": "uNB1tW_F4WYk" + }, + "id": "uNB1tW_F4WYk" + }, + { + "cell_type": "code", + "source": [ + "from google.colab import drive\n", + "\n", + "drive.mount('/content/drive')" + ], + "metadata": { + "id": "EiKT1-3X5wlF" + }, + "id": "EiKT1-3X5wlF", + "execution_count": null, + "outputs": [] + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2024-07-03T13:43:55.575099Z", + "start_time": "2024-07-03T13:43:55.564950Z" + }, + "id": "5ccc8547c6539d3b", + "outputId": "eedd9e73-dd05-4c40-8e7f-3b40f997db71" + }, + "cell_type": "code", + "source": [ + "import os\n", + "import dotenv\n", + "\n", + "dotenv.load_dotenv('/path/to/your/.env') # replace with the path to your .env file in your Google Drive" + ], + "id": "5ccc8547c6539d3b", + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "## Create index in local SingleStoreDB\n", + "\n", + "Before we build the unstructured data preprocessing pipeline, let's create a local SingleStoreDB database and a table in it to store the processed data.\n", + "\n", + "For an example of a schema, please refer to [Unstructured documentation](https://docs.unstructured.io/api-reference/ingest/destination-connector/singlestore#singlestore-table-schema). If you'll be using the schema from the documentation, make sure that the `dims` value for the embeddings matches the number of dimensions of the embeddings model you choose to use. In this example it's set to 768, but your embedding model may produce vectors of a different dimension." + ], + "metadata": { + "id": "-80mXole4qAe" + }, + "id": "-80mXole4qAe" + }, + { + "metadata": { + "id": "c771793db2af5ac9", + "outputId": "ea27ea1b-45aa-413e-a39c-48ff4a444076" + }, + "cell_type": "code", + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Command executed successfully. Output:\n", + "5a3518036f1d1936efe9145165ff52d2478ac9c49c1c471adecb3726d5adbac2\n", + "\n" + ] + } + ], + "execution_count": null, + "source": [ + "import subprocess\n", + "\n", + "schema_path = \"/PATH/TO/schema.sql\" # replace with the path to your schema.sql file in your Google Drive\n", + "\n", + "password = \"pwd\"\n", + "\n", + "command = [\n", + " \"docker\", \"run\", \"-d\", \"--name\", \"singlestoredb-dev\",\n", + " \"-e\", f'ROOT_PASSWORD={password}',\n", + " \"--platform\", \"linux/amd64\",\n", + " \"-p\", \"3306:3306\", \"-p\", \"8080:8080\", \"-p\", \"9000:9000\",\n", + " \"-v\", f\"{schema_path}:/init.sql\",\n", + " \"ghcr.io/singlestore-labs/singlestoredb-dev:latest\",\n", + "]\n", + "\n", + "process = subprocess.Popen(command, stdout=subprocess.PIPE)\n", + "output, error = process.communicate()\n", + "\n", + "if process.returncode == 0:\n", + " print('Command executed successfully. Output:')\n", + " print(output.decode())\n", + "else:\n", + " print('Command failed. Error:')\n", + " print(error.decode())" + ], + "id": "c771793db2af5ac9" + }, + { + "cell_type": "markdown", + "source": [ + "Creating a database may take a few seconds. Let's check the status. We want to make sure that it says `healthy` before we begin writing into it." + ], + "metadata": { + "id": "sxDcltMB5i17" + }, + "id": "sxDcltMB5i17" + }, + { + "metadata": { + "id": "22388ff3b34bee57", + "outputId": "7e06a3bd-6d9c-4baf-ce55-69ffbc3a77fa" + }, + "cell_type": "code", + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\r\n", + "5a3518036f1d ghcr.io/singlestore-labs/singlestoredb-dev:latest \"/scripts/start.sh\" 18 seconds ago Up 17 seconds (healthy) 0.0.0.0:3306->3306/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:9000->9000/tcp singlestoredb-dev\r\n" + ] + } + ], + "execution_count": null, + "source": [ + "!docker ps" + ], + "id": "22388ff3b34bee57" + }, + { + "cell_type": "markdown", + "source": [ + "## PDFs ingestion and preprocessing pipeline" + ], + "metadata": { + "id": "a8X_GQ32GQnI" + }, + "id": "a8X_GQ32GQnI" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2024-07-03T13:44:10.587954Z", + "start_time": "2024-07-03T13:44:04.335563Z" + }, + "id": "99881631767c71e2" + }, + "cell_type": "code", + "source": [ + "from unstructured.ingest.v2.interfaces import ProcessorConfig\n", + "from unstructured.ingest.v2.pipeline.pipeline import Pipeline\n", + "from unstructured.ingest.v2.processes.chunker import ChunkerConfig\n", + "from unstructured.ingest.v2.processes.connectors.fsspec.s3 import (\n", + " S3ConnectionConfig,\n", + " S3DownloaderConfig,\n", + " S3IndexerConfig,\n", + " S3AccessConfig,\n", + ")\n", + "from unstructured.ingest.v2.processes.connectors.singlestore import (\n", + " SingleStoreAccessConfig,\n", + " SingleStoreConnectionConfig,\n", + " SingleStoreUploaderConfig,\n", + " SingleStoreUploadStagerConfig,\n", + ")\n", + "from unstructured.ingest.v2.processes.embedder import EmbedderConfig\n", + "from unstructured.ingest.v2.processes.partitioner import PartitionerConfig\n" + ], + "id": "99881631767c71e2", + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "source": [ + "Unstructured ingestion and transformation pipeline is compiled from a number of necessary configs. These don't have to be in the exact same order.\n", + "\n", + "* `ProcessorConfig`: defines general processing behavior\n", + "* `S3IndexerConfig`, `S3DownloaderConfig`, `S3ConnectionConfig`: control data ingestion from S3, including source location, and authentication options.\n", + "* `PartitionerConfig`: describes partitioning behavior. Here we only set up authentication for the Unstructured API, but you can also control [partitioning parameters](https://docs.unstructured.io/api-reference/ingest/ingest-configuration/partition-configuration) such as partitioning strategy through this config. We're going with the defaults. \n", + "* `ChunkerConfig`: defines the chunking strategy, and chunk sizes.\n", + "* `EmbedderConfig`: sets up connection to an embedding model provider to generate embeddings for data chunks.\n", + "* `SingleStoreConnectionConfig`, `SingleStoreUploadStagerConfig`, `SingleStoreUploaderConfig`: control the final step of the pipeline - data loading into SingleStore DB.\n", + "\n", + "You can also upload the data to a hosted deployment of SingleStore DB. Learn more in [Unstructured documentation](https://docs.unstructured.io/api-reference/ingest/destination-connector/singlestore)." + ], + "metadata": { + "id": "EQ0GXjYMGUqO" + }, + "id": "EQ0GXjYMGUqO" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2024-07-03T14:43:35.287223Z", + "start_time": "2024-07-03T14:43:27.689950Z" + }, + "id": "4972ded3e9fd8ca2", + "outputId": "ddc64a27-5d80-4d3c-e6dc-a9ec216bd407" + }, + "cell_type": "code", + "source": [ + "pipeline = Pipeline.from_configs(\n", + "\n", + " context=ProcessorConfig(\n", + " verbose=True,\n", + " tqdm=True,\n", + " num_processes=20,\n", + " ),\n", + "\n", + " indexer_config=S3IndexerConfig(remote_url=os.getenv(\"AWS_S3_NAME\")),\n", + " downloader_config=S3DownloaderConfig(),\n", + " source_connection_config=S3ConnectionConfig(\n", + " access_config=S3AccessConfig(\n", + " key=os.getenv(\"AWS_KEY\"),\n", + " secret=os.getenv(\"AWS_SECRET\"))\n", + " ),\n", + "\n", + " partitioner_config=PartitionerConfig(\n", + " partition_by_api=True,\n", + " api_key=os.getenv(\"UNSTRUCTURED_API_KEY\"),\n", + " partition_endpoint=os.getenv(\"UNSTRUCTURED_URL\"),\n", + " ),\n", + "\n", + " chunker_config=ChunkerConfig(\n", + " chunking_strategy=\"by_title\",\n", + " chunk_max_characters=512,\n", + " chunk_combine_text_under_n_chars=200,\n", + " ),\n", + "\n", + " embedder_config=EmbedderConfig(\n", + " embedding_provider=\"langchain-huggingface\",\n", + " embedding_model_name=\"BAAI/bge-base-en-v1.5\",\n", + " ),\n", + "\n", + " destination_connection_config=SingleStoreConnectionConfig(\n", + " access_config=SingleStoreAccessConfig(password=password),\n", + " host=\"localhost\",\n", + " port=3306,\n", + " database=\"ingest_test\",\n", + " user=\"root\",\n", + " ),\n", + " stager_config=SingleStoreUploadStagerConfig(),\n", + " uploader_config=SingleStoreUploaderConfig(table_name=\"elements\"),\n", + ")\n", + "\n", + "pipeline.run()" + ], + "id": "4972ded3e9fd8ca2", + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-07-03 10:43:27,694 MainProcess INFO Created index with configs: {\"remote_url\": \"s3://marias-rag-demo/\", \"protocol\": \"s3\", \"path_without_protocol\": \"marias-rag-demo/\", \"supported_protocols\": [\"s3\", \"s3a\", \"abfs\", \"az\", \"gs\", \"gcs\", \"box\", \"dropbox\", \"sftp\"], \"recursive\": false, \"file_glob\": null}, connection configs: {\"access_config\": \"***REDACTED***\", \"connector_type\": \"s3\", \"supported_protocols\": [\"s3\", \"s3a\"], \"endpoint_url\": null, \"anonymous\": false}\n", + "2024-07-03 10:43:27,696 MainProcess INFO Created download with configs: {\"download_dir\": null}, connection configs: {\"access_config\": \"***REDACTED***\", \"connector_type\": \"s3\", \"supported_protocols\": [\"s3\", \"s3a\"], \"endpoint_url\": null, \"anonymous\": false}\n", + "2024-07-03 10:43:27,696 MainProcess INFO Created partition with configs: {\"strategy\": \"auto\", \"ocr_languages\": null, \"encoding\": null, \"additional_partition_args\": null, \"skip_infer_table_types\": null, \"fields_include\": [\"element_id\", \"text\", \"type\", \"metadata\", \"embeddings\"], \"flatten_metadata\": false, \"metadata_exclude\": [], \"metadata_include\": [], \"partition_endpoint\": \"https://api.unstructuredapp.io/general/v0/general\", \"partition_by_api\": true, \"api_key\": \"*******\", \"hi_res_model_name\": null}\n", + "2024-07-03 10:43:27,697 MainProcess INFO Created chunk with configs: {\"chunking_strategy\": \"by_title\", \"chunking_endpoint\": \"https://api.unstructured.io/general/v0/general\", \"chunk_by_api\": false, \"chunk_api_key\": null, \"chunk_combine_text_under_n_chars\": 200, \"chunk_include_orig_elements\": null, \"chunk_max_characters\": 512, \"chunk_multipage_sections\": null, \"chunk_new_after_n_chars\": null, \"chunk_overlap\": null, \"chunk_overlap_all\": null}\n", + "2024-07-03 10:43:27,698 MainProcess INFO Created embed with configs: {\"embedding_provider\": \"langchain-huggingface\", \"embedding_api_key\": null, \"embedding_model_name\": \"BAAI/bge-base-en-v1.5\", \"embedding_aws_access_key_id\": null, \"embedding_aws_secret_access_key\": null, \"embedding_aws_region\": null}\n", + "2024-07-03 10:43:28,555 MainProcess INFO Created upload_stage with configs: {\"drop_empty_cols\": false}\n", + "2024-07-03 10:43:28,556 MainProcess INFO Created upload with configs: {\"table_name\": \"elements\", \"batch_size\": 100}, connection configs: {\"access_config\": \"***REDACTED***\", \"host\": \"localhost\", \"port\": 3306, \"user\": \"root\", \"database\": \"ingest_test\"}\n", + "2024-07-03 10:43:28,562 MainProcess INFO Running local pipline: index (S3Indexer) -> download (S3Downloader) -> partition (auto) -> chunk (by_title) -> embed (langchain-huggingface) -> upload_stage (SingleStoreUploadStager) -> upload (SingleStoreUploader) with configs: {\"reprocess\": false, \"verbose\": true, \"tqdm\": true, \"work_dir\": \"/Users/mk/.cache/unstructured/ingest/pipeline\", \"num_processes\": 20, \"max_connections\": null, \"raise_on_error\": false, \"disable_parallelism\": false, \"preserve_downloads\": false, \"download_only\": false, \"max_docs\": null, \"re_download\": false, \"uncompress\": false, \"status\": {}, \"semaphore\": null}\n", + "2024-07-03 10:43:28,631 MainProcess DEBUG Generated file data: FileData(identifier='marias-rag-demo/1501.00921v2.pdf', connector_type='s3', source_identifiers=SourceIdentifiers(filename='1501.00921v2.pdf', fullpath='marias-rag-demo/1501.00921v2.pdf', rel_path='1501.00921v2.pdf'), doc_type=, metadata=DataSourceMetadata(url='s3://marias-rag-demo/1501.00921v2.pdf', version='c05ebfb2844809ebb89513e64b33ad86', record_locator={\"protocol\": \"s3\", \"remote_file_path\": \"s3://marias-rag-demo/\"}, date_created='1717959666.0', date_modified='1717959666.0', date_processed='1720017808.631155', permissions_data=None), additional_metadata={\"Key\": \"*******\", \"LastModified\": \"2024-06-09T19:01:06+00:00\", \"ETag\": \"\\\"c05ebfb2844809ebb89513e64b33ad86\\\"\", \"Size\": 357133, \"StorageClass\": \"STANDARD\", \"type\": \"file\", \"size\": 357133, \"name\": \"marias-rag-demo/1501.00921v2.pdf\"}, reprocess=False)\n", + "2024-07-03 10:43:28,633 MainProcess DEBUG Generated file data: FileData(identifier='marias-rag-demo/1608.04880v1.pdf', connector_type='s3', source_identifiers=SourceIdentifiers(filename='1608.04880v1.pdf', fullpath='marias-rag-demo/1608.04880v1.pdf', rel_path='1608.04880v1.pdf'), doc_type=, metadata=DataSourceMetadata(url='s3://marias-rag-demo/1608.04880v1.pdf', version='f6f2fc85dabac91586ca6cce2d1202e1', record_locator={\"protocol\": \"s3\", \"remote_file_path\": \"s3://marias-rag-demo/\"}, date_created='1717959672.0', date_modified='1717959672.0', date_processed='1720017808.633616', permissions_data=None), additional_metadata={\"Key\": \"*******\", \"LastModified\": \"2024-06-09T19:01:12+00:00\", \"ETag\": \"\\\"f6f2fc85dabac91586ca6cce2d1202e1\\\"\", \"Size\": 945977, \"StorageClass\": \"STANDARD\", \"type\": \"file\", \"size\": 945977, \"name\": \"marias-rag-demo/1608.04880v1.pdf\"}, reprocess=False)\n", + "2024-07-03 10:43:28,635 MainProcess INFO Calling DownloadStep with 2 docs\n", + "2024-07-03 10:43:28,635 MainProcess INFO processing content async\n", + "2024-07-03 10:43:28,635 MainProcess WARNING async code being run in dedicated thread pool to not conflict with existing event loop: <_UnixSelectorEventLoop running=True closed=False debug=False>\n", + "download: 0%| | 0/2 [00:00\n", + "partition: 0%| | 0/2 [00:00 Connection:\n", + " conn = s2.connect(\n", + " host=host,\n", + " port=port,\n", + " database=database,\n", + " user=user,\n", + " password=password,\n", + " )\n", + " return conn\n", + "\n", + "\n", + "def validate(table_name: str, conn: Connection, num_elements: int):\n", + " with conn.cursor() as cur:\n", + " stmt = f\"select * from {table_name}\"\n", + " count = cur.execute(stmt)\n", + " assert (\n", + " count == num_elements\n", + " ), f\"found count ({count}) doesn't match expected value: {num_elements}\"\n", + " print(\"validation successful\")\n", + "\n", + "\n", + "def run_validation(\n", + " host: str,\n", + " port: int,\n", + " user: str,\n", + " database: str,\n", + " password: str,\n", + " table_name: str,\n", + " num_elements: int,\n", + "):\n", + " print(f\"Validating that table {table_name} in database {database} has {num_elements} entries\")\n", + " conn = get_connection(host=host, port=port, database=database, user=user, password=password)\n", + " validate(table_name=table_name, conn=conn, num_elements=num_elements)\n", + "\n", + "\n", + "run_validation(\n", + " host = \"localhost\",\n", + " port = 3306,\n", + " user = \"root\",\n", + " database = \"ingest_test\",\n", + " password = \"pwd\",\n", + " table_name = \"elements\",\n", + " num_elements = 345,\n", + ")" + ], + "id": "3857e5b5cd114abe", + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Validating that table elements in database ingest_test has 345 entries\n", + "validation successful\n" + ] + } + ], + "execution_count": null + }, + { + "metadata": { + "id": "b3fb4febe23f1832" + }, + "cell_type": "markdown", + "source": [ + "## Retrieve relevant documents from SingleStoreDB\n" + ], + "id": "b3fb4febe23f1832" + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2024-07-08T18:15:20.149136Z", + "start_time": "2024-07-08T18:15:20.143790Z" + }, + "id": "393cb93e9706feba" + }, + "cell_type": "code", + "source": [ + "from sentence_transformers import SentenceTransformer\n", + "import json\n", + "\n", + "def get_embedding(query):\n", + " model = SentenceTransformer(\"BAAI/bge-base-en-v1.5\")\n", + " return model.encode(query, normalize_embeddings=True)\n", + "\n", + "def retrieve_documents(conn: Connection, query: str, num_results: int = 5):\n", + "\n", + " embedding = get_embedding(query)\n", + " embedding_list = embedding.tolist()\n", + " embedding_json = json.dumps(embedding_list)\n", + "\n", + " with conn.cursor() as cur:\n", + "\n", + " stmt = \"\"\"\n", + " SELECT\n", + " text,\n", + " filename,\n", + " DOT_PRODUCT(embeddings, JSON_ARRAY_PACK_F32(%s)) AS score\n", + " FROM elements\n", + " ORDER BY score DESC\n", + " LIMIT %s\n", + " \"\"\"\n", + "\n", + " cur.execute(stmt, [embedding_json, num_results])\n", + "\n", + " results = cur.fetchall()\n", + "\n", + " return results" + ], + "id": "393cb93e9706feba", + "outputs": [], + "execution_count": null + }, + { + "metadata": { + "ExecuteTime": { + "end_time": "2024-07-08T18:15:21.736297Z", + "start_time": "2024-07-08T18:15:20.686070Z" + }, + "id": "5999c70db27f0a19", + "outputId": "9081a8db-d7c9-45b8-e9f4-41c940c254df" + }, + "cell_type": "code", + "source": [ + "conn = get_connection(host=\"localhost\", port=3306, database=\"ingest_test\", user=\"root\", password=\"pwd\")\n", + "retrieve_documents(conn, \"pest control through mating disruption pheromones\")" + ], + "id": "5999c70db27f0a19", + "outputs": [ + { + "data": { + "text/plain": [ + "[('Controlling pest insects is a challenge of main importance to preserve crop pro- duction. In the context of Integrated Pest Management (IPM) programs, we develop a generic model to study the impact of mating disruption control using an artificial female pheromone to confuse males and adversely affect their mating opportunities. Consequently the reproduction rate is diminished leading to a decline in the population size. For more efficient control, trapping is used to capture the males attracted to the artificial',\n", + " '1608.04880v1.pdf',\n", + " 0.8843122720718384),\n", + " ('In order to maintain the pest population to a low level, we consider a control using female- pheromone-traps to disrupt male mating behaviour. More precisely, we take into account two aspects for the control. The first aspect consists of disturbing the mating between males and females to reduce the fertilisation opportunities, which in turn, reduces the number of offspring. This is done using traps that are releasing a female pheromone lure to which males are attracted. This leads to a reduction in the number',\n", + " '1608.04880v1.pdf',\n", + " 0.863226592540741),\n", + " ('which can explain the failure of the experiments mentioned above. Mathematical modelling can be very helpful to get a better understanding on the dynamics of the pest population, and various control strategies can be studied to optimise the control. Here we combine mating disruption using female-sex pheromones lures to attract males away from females in order to reduce the mating opportunities adversely affecting the rate reproduction. For more efficient control, lures can be placed in traps to reduce the male',\n", + " '1608.04880v1.pdf',\n", + " 0.8604112863540649),\n", + " ('Mating disruption using pheromones has been widely studied to control moth pests [11, 15] on various types of crops. An early demonstration of the applicability of MAT has been shown for the eradication of Bactrocera doraslis in the Okinawa Islands in 1984 [29]. More recently, the method has shown to be successful for the control of Tuta absoluta on tomato crops in Italian greenhouses [16]. Other successful cases are reported in [15], such as for the control of the pink bollworm Pectinophora gossypiella',\n", + " '1608.04880v1.pdf',\n", + " 0.8557718396186829),\n", + " ('Controlling insect pest population in environmentally respectful manner is a main challenge in IPM programs. Mating disruption using female sex-pheromone based lures falls within IPM requirements as it is species specific and leaves no toxic residues in the produce grown. In this work, we build a generic model, governed by a system of ODEs to simulate the dynamics of a pest population and its response to mating disruption control with trapping. From the theoretical analysis of the model, we identify two',\n", + " '1608.04880v1.pdf',\n", + " 0.8526250123977661)]" + ] + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "execution_count": null + }, + { + "metadata": { + "id": "bc437868b78a41b0" + }, + "cell_type": "code", + "outputs": [], + "execution_count": null, + "source": [], + "id": "bc437868b78a41b0" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.6" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file