Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add a quick start page to the readme and docs #240

Merged
merged 6 commits into from
Feb 17, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 40 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,48 @@ about. Bricks in the library fall into three categories:
- :performing_arts: ***Staging bricks*** that format data for downstream tasks, such as ML inference
and data labeling.
<br></br>
## :eight_pointed_black_star: Installation
## :eight_pointed_black_star: Quick Start

Use the following instructions to get up and running with `unstructured` and test your
installation.

- Install the Python SDK with `pip install unstructured[local-inference]`
- If you do not need to process PDFs or images, you can run `pip install unstructured`
MthwRobinson marked this conversation as resolved.
Show resolved Hide resolved
- Install the following system dependencies if they are not already available on your system.
Depending on what document types you're parsing, you may not need all of these.
- `libmagic-dev`
- `poppler-utils`
- `tesseract-ocr`
- `libreoffice`
MthwRobinson marked this conversation as resolved.
Show resolved Hide resolved
- Run the following to install NLTK dependencies. `unstructured` will handle this automatically
soon.
- `python -c "import nltk; nltk.download('punkt')"`
- `python -c "import nltk; nltk.download('averaged_perceptron_tagger')"`
- If you are parsing PDFs, run the following to install the `detectron2` model, which
`unstructured` uses for layout detection:
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"`
MthwRobinson marked this conversation as resolved.
Show resolved Hide resolved

At this point, you shold be able to run the following code:
MthwRobinson marked this conversation as resolved.
Show resolved Hide resolved

To install the library, run `pip install unstructured`.
```python
from unstructured.partition.auto import partition

elements = partition(filename="example-docs/fake-email.eml")
```

And if you installed with `local-inference`, you should be able to run this as well:

```python
from unstructured.partition.auto import partition

elements = partition("example-docs/layout-parser-paper.pdf")
```


## :coffee: Installation Instructions for Local Development

## :coffee: Getting Started
The following instructions are intended to help you get up and running with `unstructured`
locally if you are planning to contribute to the project.

* Using `pyenv` to manage virtualenv's is recommended but not necessary
* Mac install instructions. See [here](https://github.com/Unstructured-IO/community#mac--homebrew) for more detailed instructions.
Expand Down
41 changes: 37 additions & 4 deletions docs/source/installing.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,43 @@
Installation
============

You can install the library by cloning the repo and running ``make install`` from the
root directory. Developers can run ``make install-local`` to install the dev and test
requirements alongside the base requirements. If you want a minimal installation without any
parser specific dependencies, run ``make install-base``.
Quick Start
-----------

Use the following instructions to get up and running with ``unstructured`` and test your
installation.

* Install the Python SDK with ``pip install unstructured[local-inference]``
* If you do not need to process PDFs or images, you can run ``pip install unstructured``

* Install the following system dependencies if they are not already available on your system. Depending on what document types you're parsing, you may not need all of these.
* ``libmagic-dev``
* ``poppler-utils``
* ``tesseract-ocr``
* ``libreoffice``

* Run the following to install NLTK dependencies. ``unstructured`` will handle this automatically soon.
* ``python -c "import nltk; nltk.download('punkt')"``
* ``python -c "import nltk; nltk.download('averaged_perceptron_tagger')"``

* If you are parsing PDFs, run the following to install the ``detectron2`` model, which ``unstructured`` uses for layout detection:
* ``pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"``

At this point, you shold be able to run the following code:

.. code:: python

from unstructured.partition.auto import partition

elements = partition(filename="example-docs/fake-email.eml")

And if you installed with `local-inference`, you should be able to run this as well:

.. code:: python

from unstructured.partition.auto import partition

elements = partition("example-docs/layout-parser-paper.pdf")
MthwRobinson marked this conversation as resolved.
Show resolved Hide resolved


Installation with ``conda`` on Windows
Expand Down