Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs #328

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
c39db6e
Quickstart docs, add mustache template
lahdjirayhan Dec 11, 2021
15eba41
Modify mustache file
lahdjirayhan Dec 12, 2021
df049bc
Update .gitignore to not track documentation builds
lahdjirayhan Dec 12, 2021
6979c5d
Delete build directory
lahdjirayhan Dec 12, 2021
a2d2750
Add initial documentation and docstring
lahdjirayhan Dec 12, 2021
ee09d9a
Add example/tutorial
lahdjirayhan Dec 12, 2021
80d1981
Merge branch 'master' into add-docs
lahdjirayhan Dec 12, 2021
a61347e
Update .gitignore to not track .vscode
lahdjirayhan Dec 14, 2021
aa38bf0
Modify my docstring to match owner expectation
lahdjirayhan Dec 14, 2021
2dade08
Rewrite index.rst
lahdjirayhan Dec 14, 2021
a844c45
Merge branch 'master' into add-docs
lahdjirayhan Dec 15, 2021
c2eabfb
Add examples
lahdjirayhan Dec 16, 2021
00e08d8
Add docstrings on Twitter module
lahdjirayhan Dec 16, 2021
b49fef6
Add docstrings on Instagram module
lahdjirayhan Dec 18, 2021
34cf780
Add docstrings on Telegram module
lahdjirayhan Dec 18, 2021
c62a9b4
Add docstring to Reddit module
lahdjirayhan Dec 18, 2021
4e2d184
Add docstring to VK module
lahdjirayhan Dec 18, 2021
a733e26
Fix docstring formatting
lahdjirayhan Dec 18, 2021
75b287b
Merge branch 'master' into backup-add-docs
lahdjirayhan Dec 26, 2021
ab1dbe9
Try autosummary
lahdjirayhan Dec 18, 2021
b5dcf41
Update .gitignore to not track autogenerated _autosummary
lahdjirayhan Dec 27, 2021
26fedeb
Add templates
lahdjirayhan Dec 27, 2021
44ca124
Slight fix
lahdjirayhan Dec 27, 2021
d2ba2c9
Modify template to remove double init in docs
lahdjirayhan Dec 27, 2021
319b575
Add docs to facebook module
lahdjirayhan Dec 28, 2021
ccbe847
Add docs in weibo module
lahdjirayhan Dec 28, 2021
59f69e5
Add docs to base Scraper class' get_items
lahdjirayhan Dec 28, 2021
294f6b7
Modify index.rst to have some toctree structure for entire package
lahdjirayhan Dec 28, 2021
ca5bf06
Merge branch 'master' into add-docs
lahdjirayhan Jan 5, 2022
c9a5c08
Update/add docstrings
lahdjirayhan Jan 15, 2022
845ff32
Merge branch 'master' into add-docs
lahdjirayhan Jan 15, 2022
a10a195
Update/add docstrings again
lahdjirayhan Jan 15, 2022
8e697a3
Add/update docs for mastodon objects
lahdjirayhan Jan 18, 2022
955bee8
Add/update docs for twitter
lahdjirayhan Jan 18, 2022
31d495e
Update .gitignore to not track venv folder
lahdjirayhan Jan 18, 2022
36f4d0e
Detect snscrape version in docs using importlib
lahdjirayhan Jan 18, 2022
2cb811b
Update index.rst to add mastodon
lahdjirayhan Jan 18, 2022
fe818fa
Fix incorrect docstring on TwitterTweetScraper
lahdjirayhan Jan 28, 2022
80627eb
Merge branch 'master' into add-docs
lahdjirayhan Feb 17, 2022
0eebb3b
Retrieve everything except project name from importlib.metadata
lahdjirayhan Feb 17, 2022
0832e95
Fix typo in index.rst
lahdjirayhan Feb 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,7 @@ __pycache__/
/dist/
/snscrape.egg-info/
/.eggs/
/docs/_build/**
/docs/_autosummary/**
.vscode/
venv/
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
32 changes: 32 additions & 0 deletions docs/_templates/custom-class-template.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{{ fullname | escape | underline}}

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}
:members:
:show-inheritance:
:inherited-members:

{% block methods %}


{% if methods %}
.. rubric:: {{ _('Methods') }}

.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}

.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
66 changes: 66 additions & 0 deletions docs/_templates/custom-module-template.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{{ fullname | escape | underline}}

.. automodule:: {{ fullname }}

{% block attributes %}
{% if attributes %}
.. rubric:: Module Attributes

.. autosummary::
:toctree:
{% for item in attributes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block functions %}
{% if functions %}
.. rubric:: {{ _('Functions') }}

.. autosummary::
:toctree:
{% for item in functions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block classes %}
{% if classes %}
.. rubric:: {{ _('Classes') }}

.. autosummary::
:toctree:
:template: custom-class-template.rst
{% for item in classes %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block exceptions %}
{% if exceptions %}
.. rubric:: {{ _('Exceptions') }}

.. autosummary::
:toctree:
{% for item in exceptions %}
{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

{% block modules %}
{% if modules %}
.. rubric:: Modules

.. autosummary::
:toctree:
:template: custom-module-template.rst
:recursive:
{% for item in modules %}
{{ item.split('.')[-1] }}
{%- endfor %}
{% endif %}
{% endblock %}
11 changes: 11 additions & 0 deletions docs/api-reference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. This file should contain API reference. Ideally, an automatic discovery/summary.

API Reference
=============

.. autosummary::
:toctree: _autosummary
:template: custom-module-template.rst
:recursive:

snscrape
87 changes: 87 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('..'))

# Tools for importing snscrape at build time
# Avoid name conflict with sphinx configuration variable "version"
from importlib import import_module
from importlib.metadata import metadata


# -- Project information -----------------------------------------------------

# Project name
project = 'snscrape'

# Metadata
_metadata = metadata(project)

# Version in format 0.4.0.20211208
release = _metadata['version']
author = _metadata['author']

_major, _minor, _patch, _yyyymmdd = release.split('.')

YEAR = _yyyymmdd[0:4]
copyright = f'{YEAR}, {author}'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
# 'sphinx_autodoc_typehints'
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

# -- Custom extension options ------------------------------------------------

# Put type hint in description instead of signature
# Note: the docstrings are overridden if autodoc_typehints is used
autodoc_typehints = 'description'

# Set 'both' to use both class and __init__ docstrings.
autoclass_content = 'both'

# Might want to look at it:
# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autodoc_type_aliases
# autodoc_type_aliases = {}

# Turn on autosummary
autosummary_generate = True

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'nature'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
30 changes: 30 additions & 0 deletions docs/google.mustache
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{{! Modified Google Docstring Template }}
{{summaryPlaceholder}}
{{extendedSummaryPlaceholder}}
{{#parametersExist}}
Args:
{{#args}}
{{var}}: {{descriptionPlaceholder}}
{{/args}}
{{#kwargs}}
{{var}}: {{descriptionPlaceholder}}. Defaults to {{&default}}.
{{/kwargs}}
{{/parametersExist}}
{{#exceptionsExist}}
Raises:
{{#exceptions}}
{{type}}: {{descriptionPlaceholder}}
{{/exceptions}}
{{/exceptionsExist}}
{{#returnsExist}}
Returns:
{{#returns}}
{{descriptionPlaceholder}}
{{/returns}}
{{/returnsExist}}
{{#yieldsExist}}
Yields:
{{#yields}}
{{typePlaceholder}}: {{descriptionPlaceholder}}
{{/yields}}
{{/yieldsExist}}
112 changes: 112 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
.. snscrape documentation master file, created by
sphinx-quickstart on Sat Dec 11 06:18:23 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to snscrape's documentation!
====================================

``snscrape`` is a scraper for social networking services (SNS). It scrapes through things like user profiles, hashtags, or searches and returns the discovered items, usually posts. ``snscrape`` supports several SNS:

================== =======================================================
Platform Can scrape for items in:
================== =======================================================
Twitter User profile, hashtag, search, thread, list, trending
Instagram User profile, hashtag, location
Reddit User profile, subreddit, search (via Pushshift)
Facebook User profile, group, community (for visitor posts)
Telegram Channel
VKontakte User profile
Weibo (Sina Weibo) User profile
Mastodon User profile, thread
================== =======================================================

``snscrape`` works without the need for logins/authentications. The drawback of doing so, however, is that some platforms (right now, or in the future) may try to impose limits for unauthenticated or not-logged-in requests coming from your IP address. Such IP-based limits are usually temporary.

``snscrape`` can be used either from CLI or imported as a library.

CLI usage
---------

The generic syntax of snscrape's CLI is:

.. code-block:: console

snscrape [GLOBAL-OPTIONS] SCRAPER-NAME [SCRAPER-OPTIONS] [SCRAPER-ARGUMENTS...]

``snscrape --help`` and ``snscrape SCRAPER-NAME --help`` provide details on the options and arguments. ``snscrape --help`` also lists all available scrapers.

The default output of the CLI is the URL of each result.

Some noteworthy global options are:

* ``--jsonl`` to get output as JSONL. This includes all information extracted by ``snscrape`` (e.g. message content, datetime, images; details vary by scraper).
* ``--max-results NUMBER`` to only return the first ``NUMBER`` results.
* ``--with-entity`` to get an item on the entity being scraped, e.g. the user or channel. This is not supported on all scrapers. (You can use this together with ``--max-results 0`` to only fetch the entity info.)

**Examples**

Collect all tweets by Jason Scott (@textfiles):

.. code-block:: console

snscrape twitter-user textfiles

It's usually useful to redirect the output to a file for further processing, e.g. in bash using the filename ``twitter-@textfiles``:

.. code-block:: console

snscrape twitter-user textfiles >twitter-@textfiles


To get the latest 100 tweets with the hashtag #archiveteam:

.. code-block:: console

snscrape --max-results 100 twitter-hashtag archiveteam


Library usage
-------------

The general idea of steps is:

#. **Instantiate a scraper object.**
``snscrape`` provides various object classes that implement their own specific ways. For example, :class:`TwitterSearchScraper` gathers tweets via search query, and :class:`TwitterUserScraper` gathers tweets from a specified user.
#. **Call the scraper's** ``get_item()`` **method.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_item()?

``get_item()`` is an iterator and yields one item at a time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


Each scraper class provides different options and arguments. Refer to the class signature for more information, e.g. in Jupyter Notebook it can be done via::

?TwitterSearchScraper

**Examples**

Collect tweets by searching for "omicron variant", limit the results to first 100 tweets, and save the results to a list:

.. code-block:: python

from snscrape.modules import TwitterSearchScraper
scraper = TwitterSearchScraper('omicron variant')

result = []

for i, item in enumerate(scraper.get_items()):
result.append(item)
if i == 100:
break

API reference
=============

.. toctree::
:maxdepth: 5

api-reference

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Loading