diff --git a/docs/source/conf.py b/docs/source/conf.py index d93112ae..b6e98736 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -14,6 +14,14 @@ # import sys # sys.path.insert(0, os.path.abspath('.')) +# Adding Support for GIFs in Sphinx +from sphinx.builders.html import StandaloneHTMLBuilder +StandaloneHTMLBuilder.supported_image_types = [ + 'image/svg+xml', + 'image/gif', + 'image/png', + 'image/jpeg' +] # -- Project information ----------------------------------------------------- diff --git a/docs/source/doc_maintenance.rst b/docs/source/doc_maintenance.rst index 8819ad6e..4773f915 100644 --- a/docs/source/doc_maintenance.rst +++ b/docs/source/doc_maintenance.rst @@ -47,6 +47,10 @@ Assuming that your Sphinx installation was successful, Sphinx should build a loc open build/html/index.html +In case this command did not work, for example on Ubuntu 18.04 you may get a message like “Couldn’t get a file descriptor referring to the console”, try: :: + + see build/html/index.html + You now have a local build of the AboutCode documents. Improve AboutCode Documents diff --git a/docs/source/index.rst b/docs/source/index.rst index 7cccce5f..fdabf496 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -5,8 +5,10 @@ Guide ***** .. toctree:: - :maxdepth: 2 + :maxdepth: 3 + scancode-toolkit/index + scancode-workbench/index license help doc_maintenance diff --git a/docs/source/scancode-toolkit/clock.png b/docs/source/scancode-toolkit/clock.png new file mode 100644 index 00000000..6fbdf8f8 Binary files /dev/null and b/docs/source/scancode-toolkit/clock.png differ diff --git a/docs/source/scancode-toolkit/comprehensive_installation.rst b/docs/source/scancode-toolkit/comprehensive_installation.rst new file mode 100644 index 00000000..ff0ae9b5 --- /dev/null +++ b/docs/source/scancode-toolkit/comprehensive_installation.rst @@ -0,0 +1,107 @@ +Comprehensive Installation +========================== + +ScanCode requires Python 2.7.x and is tested on Linux, Mac, and Windows. Make sure Python 2.7 is installed first. + +System Requirements +------------------- + +- Hardware : ScanCode will run best with a modern X86 processor and at least 2GB of RAM and 250MB of disk. + +- Supported operating systems : ScanCode should run on these OSes: + + #. Linux: on most recent 64-bit Linux distributions (32-bit distros are only partially supported), + #. Mac: on recent Mac OSX (10.6.8 and up), + #. Windows: on Windows 7 and up (32- or 64-bit) using a 32-bit Python. + +Prerequisites +------------- +ScanCode needs a Python 2.7 interpreter. + +- On Linux: Use your package manager to install python2.7. If Python 2.7 is not available from your package manager, you must compile it from sources. For instance, visit https://github.com/dejacode/about-code-tool/wiki/BuildingPython27OnCentos6 for instructions to compile Python from sources on Centos. + +- On Ubuntu 12.04, 14.04 and 16.04, you will need to install these packages first: ``python-dev bzip2 xz-utils zlib1g libxml2-dev libxslt1-dev`` + +- On Debian and Debian-based distros you will need to install these packages first: ``python-dev libbz2-1.0 xz-utils zlib1g libxml2-dev libxslt1-dev`` + +- On RPM-based distros, you will need to install these packages first: ``python-devel zlib bzip2-libs xz-libs libxml2-devel libxslt-devel`` + +- **On Windows**: + + Use the Python 2.7 32-bit (e.g. the Windows x86 MSI installer) for X86 regardless of whether you run Windows on 32-bit or 64-bit. DO NOT USE Python X86_64 installer even if you run 64 bit Windows. Download Python from this url: https://www.python.org/ftp/python/2.7.13/python-2.7.13.msi + + Install Python on the c: drive and use all default installer options(scancode will try to find python just in c:\python27\python.exe). See the Windows installation section for more installation details. + +- On Mac: Download and install Python from this url: https://www.python.org/ftp/python/2.7.13/python-2.7.13-macosx10.6.pkg + +Do not use Unicode, non-ASCII in your installation Path +------------------------------------------------------- +There is a bug in underlying libraries that prevent this. + +Installation on Linux and Mac +----------------------------- + +Download and extract the latest ScanCode release from: +https://github.com/nexB/scancode-toolkit/releases/latest + +Open a terminal in the extracted directory and run:: + + ./scancode --help + +This will configure ScanCode and display the command line help. + +Installation on Windows +----------------------- + +- Download the latest ScanCode release zip file from https://github.com/nexB/scancode-toolkit/releases/latest + +- In Windows Explorer (called File Explorer on Windows 10), select the downloaded ScanCode zip and right-click. + +- In the pop-up menu select 'Extract All...' + +- In the pop-up window 'Extract zip folders' ('Extract Compressed (Zipped) Folders' on Windows 10) use the default options to extract. + +- Once the extraction is complete, a new Windows Explorer/File Explorer window will pop up. + +- In this Explorer window, select the new folder that was created and right-click. + + * On Windows 10, double-click the new folder, select one of the files inside the folder (e.g., 'setup.py'), and right-click. +- In the pop-up menu select 'Properties'. + +- In the pop-up window 'Properties', select the Location value. Copy this to the clipboard and close the 'Properties' window. + +- Press the start menu button. (On Windows 10, click the search box or search icon in the taskbar.) + +- In the search box type:: + + cmd + +- Select 'cmd.exe' listed in the search results. (On Windows 10, you may see 'Command Prompt' instead -- select that.) + +- A new 'cmd.exe' window ('Command Prompt' on Windows 10) pops up. + +- In this window (aka a 'command prompt'), type the following (i.e., 'cd' followed by a space):: + + cd + +- Right-click in this window and select Paste. This will paste the path where you extracted ScanCode. + +- Press Enter. + +- This will change the current location of your command prompt to the root directory where scancode is installed. + +- Then type:: + + scancode -h + +- Press enter. This will configure your ScanCode installation. + +- Several messages are displayed followed by the scancode command help. + +- The installation is complete. + +Un-installation +--------------- + +- Delete the directory in which you extracted ScanCode. +- Delete any temporary files created in your system temp directory under a scancode directory. diff --git a/docs/source/scancode-toolkit/developement.rst b/docs/source/scancode-toolkit/developement.rst new file mode 100644 index 00000000..c43545e0 --- /dev/null +++ b/docs/source/scancode-toolkit/developement.rst @@ -0,0 +1,106 @@ +Development +=========== + +See CONTRIBUTING.rst for details: https://github.com/nexB/scancode-toolkit/blob/master/CONTRIBUTING.rst + +Code layout and conventions +--------------------------- + +Source code is in ``src/`` Tests are in ``tests/``. + +There is one Python package for each major feature under ``src/`` and a corresponding directory with the same name under ``tests`` (but this is not a package by design). + +Each test script is named ``test_XXXX`` and while we love to use ``py.test`` as a test runner, most tests have no dependencies on ``py.test``, only on the ``unittest`` module (with the exception of some command line tests that depend on pytest monkeypatching capabilities. + +When source or tests need data files, we store these in a ``data`` subdirectory. + +We use PEP8 conventions with a relaxed line length that can be up to 90'ish characters long when needed to keep the code clear and readable. + +We store pre-built bundled native binaries in ``bin/`` sub-directories of each ``src/`` packages. These binaries are organized by OS and architecture. This ensure that ScanCode works out of the box either using a checkout or a download, without needing a compiler and toolchain to be installed. The corresponding source code for the pre-built binaries are store in a separate repository at https://github.com/nexB/scancode-thirdparty-src + +We store bundled thirdparty components and libraries in the ``thirdparty`` directory. Python libraries are stored as wheels, eventually pre-built if the corresponding wheel is not available in the Pypi repository. Some of these components may be advanced builds with bug fixes or advanced patches. + +We write tests, a lot of tests, thousands of tests. Several tests are data-driven and use data files as test input and sometimes data files as test expectation (in this case using either JSON or YAML files). The tests should pass on Linux 64 bits, Windows 32 and 64 bits and on MacOSX 10.6.8 and up. We maintain two CI loops with Travis (Linux) at https://travis-ci.org/nexB/scancode-toolkit and Appveyor (Windows) at https://ci.appveyor.com/project/nexB/scancode-toolkit + +When finding bugs or adding new features, we add tests. See existing test code for examples. + +Running tests +------------- + +ScanCode comes with over 13,000 unit tests to ensure detection accuracy and stability across Linux, Windows and macOS OSes: we kinda love tests, do we? + +We use pytest to run the tests: call the ``py.test`` script to run the whole test suite. This is installed by ``pytest`` which is bundled with a ScanCode checkout and installed when you run ``./configure``). + +If you are running from a fresh git clone and you run ``./configure`` and then ``source bin/activate`` the ``py.test`` command will be available in your path. + +Alternatively if you have already configured but are not in an activated "virtualenv" the ``py.test`` command is available under ``/bin/py.test`` + +(Note: paths here are for POSIX, but mostly the same applies to Windows) + +If you have a multiprocessor machine you might want to run the tests in parallel (and faster) For instance: ``py.test -n4`` runs the tests on 4 CPUs. We typically run the tests in verbose mode with ``py.test -vvs -n4`` + +You can also run a subset of the test suite as shown in the CI configs https://github.com/nexB/scancode-toolkit/blob/develop/appveyor.yml#L6 e,g, ``py.test -n 2 -vvs tests/scancode`` runs only the tests scripts present in the ``tests/scancode`` directory. (You can pass a path to a specific test script file there too). + +See also https://docs.pytest.org for details or use the ``py.test -h`` command to show the many other options available. + +One useful option is to run a select subset of the test functions matching a pattern with the ``-k`` option for instance: ``py.test -vvs -k tcpdump`` would only run test functions that contain the string "tcpdump" in their name or their class name or module name . + +Another useful option after a test run with some failures is to re-run only the failed tests with the ``--lf`` option for instance: ``py.test -vvs --lf`` would only run only test functions that failed in the previous run. + +pip requirements and the configure script +----------------------------------------- + +ScanCode use the ``configure`` and ``configure.bat`` (and ``etc/configure.py`` behind the scenes) scripts to install a `virtualenv `_ , install required packaged dependencies as `pip `_ requirements and more configure tasks such that ScanCode can be installed in a self-contained way with no network connectivity required. + +Earlier unreleased versions of ScanCode where using ``buildout`` to install and configure eventually complex dependencies. We had some improvements that were merged in the upstream ``buildout`` to support bootstrapping and installing without a network connection and When we migrated to use ``pip`` and ``wheels`` as new, improved and faster way to install and configure dependencies we missed some of the features of ``buildout`` like the ``recipes``, being able to invoke arbitrary Python or shell scripts after installing packages and have scripts or requirements that are operating system-specific. + +ScanCode requirements and third-party Python libraries +------------------------------------------------------ + +In a somewhat unconventional way, all the required libraries are bundled aka. copied in the repo itself in the thirdparty/ directory. If ScanCode were only a library it would not make sense. But its is first an application and having a well defined frozen set of dependent packages is important for an app. The benefit of this approach (combined with the ``configure`` script) means that a mere checkout of the repository contains everything needed to run ScanCode except for a Python interpreter. + +Using ScanCode as a Python library +---------------------------------- + +ScanCode can be used alright as a Python library and is available as as a Python wheel in Pypi and installed with ``pip install scancode-toolkit``. + +Steps to cut a new release: +--------------------------- + +- run bumpversion with major, minor or patch to bump the version in: + + - ``src/scancode/__init__.py`` + - ``setup.py`` + - Update the CHANGELOG.rst + +- commit changes and push changes to develop: + + - ``git commit -m "commit message"`` + - ``git push --set-upstream origin develop`` + +- merge develop branch in master and tag the release. + + - ``git checkout master`` + - ``git merge develop`` + - ``git tag -a v1.6.1 -m "Release v1.6.1"`` + - ``git push --set-upstream origin master`` + - ``git push --set-upstream origin v1.6.1`` + +- draft a new release in GitHub, using the previous release blurb as a base. Highlight new and noteworthy changes from the CHANGELOG.rst. + +- run ``etc/release/release.sh`` locally. + +- upload the release archives created in the ``dist/`` directory to the GitHub release page. + +- save the release as a draft. Use the previous release notes to create notes in the same style. Ensure that the link to thirdparty source code is present. + +- test the downloads. + +- publish the release on GitHub + +- then build and publish the released wheel on Pypi. For this you need your own Pypi credentials (and get authorized to publish Pypi release: ask @pombredanne) and you need to have the ``twine`` package installed and configured. + + - Build a ``.whl`` with ``python setup.py bdist_wheel`` + - Run twine with ``twine upload dist/`` + - Once uploaded check the published release at https://pypi.python.org/pypi/scancode-toolkit/ + - Then create a new fresh local virtualenv and test the wheel installation with: ``pip install scancode-toolkit`` \ No newline at end of file diff --git a/docs/source/scancode-toolkit/documentation.rst b/docs/source/scancode-toolkit/documentation.rst new file mode 100644 index 00000000..b3504331 --- /dev/null +++ b/docs/source/scancode-toolkit/documentation.rst @@ -0,0 +1,29 @@ +Documentation +============= + +This page provides an index of current ScanCode user documentation. + +Download +-------- + +Download the latest release of ScanCode from our `release page `_ + +Installation +------------ + +See https://github.com/nexB/scancode-toolkit/blob/master/README.rst for more. + +User Guide +---------- + +The goal of ScanCode is to help you detect accurately provenance information in a codebase. +The output of the scan is either a JSON file, an HTML app or a plain HTML file. You can visualize the HTML format in a tree view format. +This view contains the following elements: + +- Code tree view - On the left side, you are able to navigate the code tree to understand what ScanCode has detected in each file. +- Path - The directory path of the analyzed file. +- Start/End Line - The line number where the Copyright or License has been detected. +- What - The type of detection, either Copyright or a License. +- Info - The name of the detected output. + +You can sort any column by clicking on its title. Search is also available in the top right corner for faster access to a specified resource or a type of detected license or copyright. \ No newline at end of file diff --git a/docs/source/scancode-toolkit/done.png b/docs/source/scancode-toolkit/done.png new file mode 100644 index 00000000..574e872d Binary files /dev/null and b/docs/source/scancode-toolkit/done.png differ diff --git a/docs/source/scancode-toolkit/faq.rst b/docs/source/scancode-toolkit/faq.rst new file mode 100644 index 00000000..81519b9d --- /dev/null +++ b/docs/source/scancode-toolkit/faq.rst @@ -0,0 +1,191 @@ +FAQ +=== + +Why ScanCode? +------------- + +We could not find an existing tool (open source or commercial) meeting our needs: + +- usable from the command line or as library +- running on Linux, Mac and Windows +- written in a higher level language such as Python +- easy to extend and evolve + +How does ScanCode work? +----------------------- + +For license detection, ScanCode uses a (large) number of license texts and license detection 'rules' that are compiled in a search index. When scanning, the text of the target file is extracted and used to query the license search index and find license matches. + +For copyright detection, ScanCode uses a grammar that defines the most common and less common forms of copyright statements. When scanning, the target file text is extracted and 'parsed' with this grammar to extract copyright statements. + +Scan results are provided in various formats: + +- a JSON file simple or pretty-printed, +- SPDX tag value or XML RDF formats, +- CSV, +- a simple unformatted HTML file that can opened in browser or as a spreadsheet. + +For each scanned file, the result contains: + +- its location in the codebase, +- the detected licenses and copyright statements, +- the start and end line numbers identifying where the license or copyright was found in the scanned file, and +- reference information for the detected license. + +For archive extraction, ScanCode uses a combination of Python modules, 7zip and libarchive/bsdtar to detect archive types and extract these recursively. + +Several other utility modules are used such as libmagic for file and mime type detection. + +How to add a new license for detection? +--------------------------------------- + +To add new license, you first need to select a new and unique license key (mit and gpl-2.0 are some of the existing license keys). All licenses are stored as plain text files in the src/licensedcode/data/licenses directory using their key as part of the file names. + +You need to create a pair of files: + +- a file with the text of the license saved in a plain text file named key.LICENSE + +- a small text data file (in YAML format) named key.yml that contains license information such as:: + + key: my-license + name: My License + +The key name can contain only these symbols: + +- lowercase letters from a to z, +- numbers from 0 to 9,and +- dash - and . period signs. No spaces. + +Save these two files in the ``src/licensedcode/data/licenses/`` directory. + +Done! + +See the ``src/licensedcode/data/licenses/`` directory for examples. + +How to add a new license detection rule? +---------------------------------------- + +A license detection rule is a pair of files: + +- a plain text rule file that is typically a variant of a license text, notice or license mention. +- a small text data file (in YAML format) documenting which license(s) should be detected for the rule text. + +To add new rule, you need to pick a unique base file name. As a convention we like to include the license key(s) that should be detected in that name to make it more descriptive. For example: mit_and_gpl-2.0 is a good base name. Add a suffix to make it unique if there is already a rule with this base name. Do not use spaces or special characters in that name. + +Then create the rule file in the src/licensedcode/data/rules/ directory using this name replacing selected_base_name with the base name you selected:: + + selected_base_name.RULE + +Save your rule text in this file. + +Then create the YAML data file in the src/licensedcode/data/rules/ directory using this name:: + + selected_base_name.yml + +For a simple mit and gpl-2.0 detection license keys detection, the content of this file can be this YAML snippet:: + + licenses: + - mit + - gpl-2.0 + +Save these two files in the ``src/licensedcode/data/licenses/`` directory and your are done! + +See the ``src/licensedcode/data/rules/`` directory for examples. + +More (advanced) rules options: + +- you can use a notes: text field to document this rule. + +- if no license should be detected for your .RULE text, do not add a list of license keys, just add a note. + +- .RULE text can contain special text regions that can be ignored when scanning for licenses. You can mark a template region in your rule text using {{double curly braces}} and up to five words can vary and still match this rule. You must add this field in your .yml data file to mark this rule as a template:: + + template: yes + +- By using a number after the opening braces, more than five words can be skipped. With {{10 double curly braces }} ten words would be skipped. + +- To mark a rule as detecting a choice of licenses, add this field in your .yml file:: + + license_choice: yes + +How to get started with development? +------------------------------------ + +ScanCode is primarily developed in Python with Python 2.7. + +Source code is at: + +- https://github.com/nexB/scancode-toolkit.git +- https://github.com/nexB/scancode-thirdparty-src.git + +Open a terminal, clone the scancode-toolkit repository, cd to the clone directory and run:: + + source configure + +On Windows open a command prompt, cd to the clone directory and run instead:: + + configure + +The configure script creates an isolated Python virtual environment ready for development usage. Rerun ``configure`` or ``source bin/activate`` when opening a new terminal. Rerun ``configure`` after a pull or a branch merge. + +To run the all tests run this command. Be patient: there are several thousand tests!:: + + py.test + +To run the tests faster on four processors in parallel run:: + + py.test -n 4 + +See also https://github.com/nexB/scancode-toolkit/wiki/Development#running-tests for more details + +More info: + +- Source code and license datasets are in the /src/ directory. +- Test code and test data are in the /tests/ directory. +- Datasets and test data are in /data/ sub-directories. +- Third-party components are vendored in the /thirdparty/ directory. ScanCode is self contained and should not require network access for installation or configuration of third-part libraries. +- Additional pre-compiled vendored binaries are stored in bin/ sub-directories of the /src/ directory with their sources in this repo: https://github.com/nexB/scancode-thirdparty-src/ +- Porting ScanCode to other OS (FreeBSD, etc.) is possible. Enter an issue for help. +- Bugs and pull requests are welcomed. +- See the wiki and CONTRIBUTING.rst for more info. + + +Can licenses be synchronized with the DejaCode license library? +--------------------------------------------------------------- + +The license keys are the same that are used in DejaCode. They are kept in sync by hand in the short term. There is also a ticket to automate that sync with DejaCode and possibly other sources. See https://github.com/nexB/scancode-toolkit/issues/41 + +How is ScanCode different from licensecheck? +-------------------------------------------- + +At a high level, ScanCode detects more licenses and copyrights than licensecheck does, reporting more details about the matches. It is likely slower. + +In more details: ScanCode is Python app using a data-driven approach (as opposed to carefully crafted regex): + +- for license scan, the detection is based on a (large) number of license full texts (~900) and license notices/rules (~1800) and is data driven as opposed to regex-driven. It detects exactly where in a file a license text is found. Just throw in more license texts to improve the detection. +- for copyright scan, the approach is natural language parsing (using NLTK) with POS tagging and a grammar; it has a few thousand tests. +- licenses and copyrights are detected in texts and binaries + +Licensecheck (available here for reference: /https://metacpan.org/release/App-Licensecheck ) is a Perl script using hand-crafted regex patterns to find typical copyright statements and about 50 common licenses. There are about 50 license detection tests. + +A quick test (in July 2015, before a major refactoring but for this notice still valid) shows that are several things not detected by licensecheck that are detected by ScanCode. + +How can I integrate ScanCode in my application? +----------------------------------------------- + +More specifically, does this tool provides an API which can be used by us for the integration with my system to trigger the license check and to use the result? + +In terms of API, there are two stable entry points: + +#. The JSON output when you use it as a command line tool from any language or when you call the scancode.cli.scancode function from a Python script. +#. Otherwise the scancode.cli.api module provides simple function if you are only interested in calling a certain service on a given file (such as license detection or copyright detection) + +Can I install ScanCode in a Unicode path? +----------------------------------------- + +Not for now. See https://github.com/nexB/scancode-toolkit/issues/867 There is a bug in virtualenv on Python2 https://github.com/pypa/virtualenv/issues/457 At this stage and until we completed the migration to Python 3 there is no way out but to use a path that contains only ASCII characters. + +The line numbers for a copyright found in a binary are weird. What do they mean? +-------------------------------------------------------------------------------- + +When scanning binaries, the line numbers are just a relative indication of where a detection was found: there is no such thing as lines in a binary. The numbers reported are based on the strings extracted from the binaries, typically broken as new lines with each NULL character. They can be safely ignored. \ No newline at end of file diff --git a/docs/source/scancode-toolkit/home.rst b/docs/source/scancode-toolkit/home.rst new file mode 100644 index 00000000..d5de2b21 --- /dev/null +++ b/docs/source/scancode-toolkit/home.rst @@ -0,0 +1,75 @@ +Home +==== + +ScanCode is a tool to scan code and detect licenses, copyrights and more + +Why ScanCode? +------------- + +Discovering the origin and license for a software component is important, but it is often much harder to accomplish than it should be because: + +- A typical software project may reuse tens or hundreds of third-party software components, +- Software authors do not always provide copyright and license information, and +- Copyright and license information that is provided may be hard to find and interpret. + +ScanCode tries to address this issue by offering: + +- A comprehensive code scanner that can detect origin or license information inside codebase files, +- A simple command line approach that runs on Windows, Linux, and Mac, +- Your choice of JSON or other output formats (HTML, CSV) for integration with other tools, +- Well-tested, easy to hack, and well-documented code, and +- Release of the code and reference data under attribution licenses (Apache 2.0 and CC-BY-1.0) + +What does ScanCode Toolkit do? +------------------------------ + +ScanCode finds the provenance information that is in your codebase with a focus on: + +- Copyright and other origin clues (emails, urls, authors,etc), and +- License notices and license text with reference information about detected licenses. + +Using this data you can: + +- Discover the origin and license of the open source and third-party software components that you use, +- Create a software component Inventory for your codebase, and +- Use this data to comply with open source license obligations such as attribution and redistribution. + +How does it work? +----------------- + +Given a codebase in a directory, ScanCode will: + +- Collect an inventory of the code files and classify the code using file types, +- Extract files from any archive using a general purpose extractor, +- Extract texts from binary files if needed, +- Use an extensible rules engine to detect open source license text and notices, +- Use a specialized parser to capture copyright statements, +- Identify packaged code and collect metadata from packages, +- Report the results in the formats of you choice (JSON, CSV, etc.) for integration with other tools, or +- Browse the results using the AboutCode Manager companion app from https://github.com/nexB/aboutcode-manager to assist your analysis. + +ScanCode should enable you to identify the “easy” cases on your own, but a software development team will probably need to build internal expertise or use outside experts (like nexB) in many cases. + +ScanCode is written in Python and also uses other open source packages. + +Alternatives? +-------------- + +There are several utilities that do some of what ScanCode does - e.g. you can grep files for copyright and license text. This may work well for simple cases - e.g. at the single file level, but we created ScanCode for ourselves because this approach does not help you to see the recurring patterns of licenses and other provenance clues. + +Or you can consider other tools such as: + +- FOSSology (open source, written in C, Linux only, GPL-licensed) +- Ninka (open source, written in Perl, GPL-licensed) +- Commercially-licensed tools, most written in Java + +History +------- + +ScanCode was originally created by nexB to support our software audit consulting services. We have used and continuously enhanced the underlying toolkit for six years. We decided to release ScanCode as open source software to give software development teams the opportunity to perform as much of the software audit function as they like on their own. + +If you have questions or are interested in nexB-provided training or support for ScanCode, please send us a note at info@scancode.io or visit http://www.nexb.com/. + +We are part of nexB Inc. and most of us are located in the San Francisco Bay Area. Our mission is to provide the tools and services that enable and accelerate component-based software development. Reusing software components is essential for the efficient delivery of software products and systems in every industry. + +Thank you for giving ScanCode a try! \ No newline at end of file diff --git a/docs/source/scancode-toolkit/ide_configaration.rst b/docs/source/scancode-toolkit/ide_configaration.rst new file mode 100644 index 00000000..ec8c4c4e --- /dev/null +++ b/docs/source/scancode-toolkit/ide_configaration.rst @@ -0,0 +1,25 @@ +IDE Configuration +================= + +The instructions below assume that you followed the `steps to set up a development environment `_ including a python virtualenv. + +PyCharm +------- + +Open the settings dialog and navigate to "Project Interpreter". Click on the gear button in the upper left corner and select "Add Local". Find the python binary in the virtualenv (``bin/python`` in the repository root) and confirm. Open a file that contains tests and set a breakpoint. Right click in the test and select "Debug ". Afterwards you can re-run the same test in the debugger using the appropriate keyboard shortcut (e.g. Shift-F9, depending on platform and configured layout). + +Visual Studio Code +------------------ + +Install the `Python extension from Microsoft `_. + +The ``configure`` script should have created a VSCode workspace directory with a basic ``settings.json``. To do this manually, add to or create the workspace settings file ``.vscode/settings.json``:: + + "python.pythonPath": "${workspaceRoot}/bin/python", + "python.unitTest.pyTestEnabled": true + +If you created the file, also add ``{`` and ``}`` on the first and last line respectively. + +When you open the project root folder in VSCode, the status bar should show the correct python interpreter and, after a while, a "Run Tests" button. If not, try restarting VSCode. + +Open a file that contains tests (e.g. ``tests/cluecode/test_copyrights.py``). Above the test functions you should now see "Run Test" and "Debug Test". Set a breakpoint in a test function and click on "Debug Test" above it. The debugger panel should show up on the left and show the program state at the breakpoint. Stepping over and into code seems not to work. Clicking one of those buttons just runs the test to completion. As a workaround, navigate to the function you want to step into, set another breakpoint and click on "continue" instead. \ No newline at end of file diff --git a/docs/source/scancode-toolkit/index.rst b/docs/source/scancode-toolkit/index.rst new file mode 100644 index 00000000..e662f969 --- /dev/null +++ b/docs/source/scancode-toolkit/index.rst @@ -0,0 +1,16 @@ +**Scancode-Toolkit Documentation** +================================== + +.. toctree:: + :maxdepth: 2 + + home + comprehensive_installation + developement + documentation + faq + ide_configaration + licence_policy_plugin + roadmap + runtime_performance_report + support \ No newline at end of file diff --git a/docs/source/scancode-toolkit/licence_policy_plugin.rst b/docs/source/scancode-toolkit/licence_policy_plugin.rst new file mode 100644 index 00000000..f6d990c7 --- /dev/null +++ b/docs/source/scancode-toolkit/licence_policy_plugin.rst @@ -0,0 +1,60 @@ +License Policy Plugin +===================== + +This plugin allows the user to apply policy details to a scancode scan, depending on which licenses are detected in a particular file. If a license specified in the Policy file is detected by scancode, this plugin will apply that policy information to the Resource as a new attribute: ``license_policy`` + +Policy File Specification +------------------------- +The Policy file is a YAML (``.yml``) document with the following struture:: + + license_policies: + - license_key: mit + label: Approved License + color_code: '#00800' + icon: icon-ok-circle + - license_key: agpl-3.0 + label: Approved License + color_code: '#008000' + icon: icon-ok-circle + - license_key: broadcom-commercial + label: Restricted License + color_code: '#FFcc33' + icon: icon-warning-sign + +The only required key is ``license_key``, which represents the scancode license key to match against the detected licenses in the scan results. +` +In the above example, a descriptive label is added along with a color code and CSS ``id`` name for potential visual display. + +Using the Plugin +---------------- + +To apply License Policies during a ScanCode scan, specify the ``--license-policy`` option. + +For example, use the following command to run a File Info and License scan on ``/path/to/codebase/``, using a License Policy file found at ``~/path/to/policy-file.yml``:: + + $ scancode -clipeu /path/to/codebase/ --license-policy ~/path/to/policy-file.yml --json-pp ~/path/to/scan-output.json + +Example Output +-------------- + +Here is an example of the ScanCode output after running ``--license-policy``:: + + { + "path": "samples/zlib/deflate.c", + "type": "file", + "licenses": [ + { + "key": "zlib", + ... + ... + ... + } + ], + "license_policy": { + "license_key": "zlib", + "label": "Approved License", + "color_code": "#00800", + "icon": "icon-ok-circle" + }, + "scan_errors": [] + } diff --git a/docs/source/scancode-toolkit/planned.png b/docs/source/scancode-toolkit/planned.png new file mode 100644 index 00000000..e66db8e6 Binary files /dev/null and b/docs/source/scancode-toolkit/planned.png differ diff --git a/docs/source/scancode-toolkit/roadmap.rst b/docs/source/scancode-toolkit/roadmap.rst new file mode 100644 index 00000000..1e63816e --- /dev/null +++ b/docs/source/scancode-toolkit/roadmap.rst @@ -0,0 +1,191 @@ +Roadmap +======= + +This is a high level list of what we are working on and what is completed. + +Legend +------ + +|white_check_mark| completed |clock1030| In progress |white_large_square| Planned, not started + +Work in Progress +---------------- + +(see Completed features below) + +Packages manifests and dependencies parsers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- |clock1030| Docker images base (as part of: https://github.com/pombredanne/conan ) #651 +- |clock1030| RubyGems base and dependencies #650 (code in https://github.com/nexB/scancode-toolkit-contrib/ ) +- |clock1030| Perl, CPAN (basic in https://github.com/nexB/scancode-toolkit-contrib/) +- |clock1030| Go : parsing for Godep in https://github.com/nexB/scancode-toolkit-contrib/ +- |clock1030| Windows PE #652 +- |white_large_square| RPMs dependencies #649 +- |white_large_square| Windows Nuget dependencies #648 +- |white_large_square| Bower packages #654 +- |white_large_square| Python dependencies #653 +- |white_large_square| CRAN +- |white_check_mark| Plain packages +- |white_large_square| other Java-related meta files (SBT, Ivy, Gradle, etc.) +- |white_large_square| Debian debs +- |white_large_square| other JavaScript (jspm, etc.) +- |white_large_square| other Linux distro packages + +License Detection +^^^^^^^^^^^^^^^^^ + +- |white_check_mark| support and detect license expressions (code in https://github.com/nexB/license-expression) +- |clock1030| support and detect composite licenses +- |white_large_square| support custom licenses +- |white_large_square| move licenses data set to external separate repository +- |white_check_mark| Improved unknown license detection +- |white_check_mark| sync with external sources (DejaCode, SPDX, etc.) + +Copyrights +^^^^^^^^^^ + +- |white_check_mark| speed up copyright detection +- |white_check_mark| improved detected lines range +- |white_check_mark| streamline grammar of copyright parser +- |white_check_mark| normalize holders and authors for summarization +- |white_check_mark| normalize and streamline results data format + +Core features +^^^^^^^^^^^^^ + +- |white_check_mark| pre scan filtering (ignore binaries, etc) +- |white_check_mark| pre/post/ouput plugins! (worked as part of the GSoC by @yadsharaf ) +- |white_check_mark| scan plugins (e.g. plugins that run a scan to collect data) +- |clock1030| support Python 3 #295 +- |clock1030| transparent archive extraction (as opposed to on-demand with extractcode) +- |clock1030| scancode.yml configuration file for exclusions, defaults, scan failure conditions, etc. +- |white_large_square| support scan pipelines and rules to organize more complex scans +- |white_large_square| scan baselining, delta scan and failure conditions (such as license change, etc) (will be spawned as its own DeltaCode project) +- |white_large_square| dedupe and similarities to avoid re-scanning. For now only identical files are scanned only once. +- |white_large_square| Improved logging, tracing and error diagnostics +- |clock1030| native support for ABC Data (See https://github.com/nexB/aboutcode/blob/master/aboutcode-data/README.rst ) + +Classification, summarization and deduction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- |white_check_mark| File classification #426 +- |white_check_mark| summarize and aggregate data #377 at the top level + +Source code support (some will be spawned as their own tool) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- |clock1030| symbols : parsing complete in https://github.com/nexB/scancode-toolkit-contrib/ +- |clock1030| metrics : some elements in https://github.com/nexB/scancode-toolkit-contrib/ + +Compiled code support (will be spawned as their own tool) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- |clock1030| ELFs : parsing complete in https://github.com/nexB/scancode-toolkit-contrib/ +- |clock1030| Java byte code : parsing complete in https://github.com/nexB/scancode-toolkit-contrib/ +- |clock1030| Windows PE : parsing complete in https://github.com/nexB/scancode-toolkit-contrib/ +- |clock1030| Mach-O : parsing complete in in https://github.com/nexB/scancode-toolkit-contrib/ +- |white_large_square| Dalvik/dex + +Data exchange +^^^^^^^^^^^^^ + +- |white_check_mark| SPDX data conversion #338 + +Packaging +^^^^^^^^^ + +- |white_large_square| simpler installation, automated installer +- |white_check_mark| distro-friendly packaging +- |white_large_square| unbundle and package as multiple libaries (commoncode, extractcode, etc) + +Documentation +^^^^^^^^^^^^^ + +- |white_large_square| integration in a build/CI loop +- |white_large_square| end to end guide to analyze a codebase +- |white_large_square| hacking guides +- |white_large_square| API doc when using ScanCode as a library + +CI integration +^^^^^^^^^^^^^^ + +- |white_large_square| Plugins for CI (Jenkins, etc) +- |white_large_square| Integration for CI (Travis, Appveyor, Drone, etc) + + +Other work in progress +---------------------- + +- ScanCode server: Spawned as its own project: https://github.com/nexB/scancode-server . Will include Integration / webhooks for Github, Bitbucket. +- VulnerableCode: NVD and CVE lookups: Spawned as its own project: https://github.com/nexB/vulnerablecode +- ScanCode Workbench: desktop app for scan review: Spawned as its own project: https://github.com/nexB/scancode-workbench +- DependentCode: dynamic dependencies resolutions: Spawned as its own project: https://github.com/nexB/dependentcode + +Package mining and matching +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +(Note that this will be spawned in its project) Some code is in https://github.com/nexB/scancode-toolkit-contrib/ + +- |clock1030| exact matching +- |clock1030| attribute-based matching +- |clock1030| fuzzy matching +- |white_large_square| peer-reviewed meta packages repo +- |white_large_square| basic mining of package repositories + +Other +^^^^^ + +- |white_large_square| Crypto code detection + + +Completed features +------------------ + +Core scans +^^^^^^^^^^ + +- |white_check_mark| exact license detection +- |white_check_mark| approximate license detection +- |white_check_mark| copyright detection +- |white_check_mark| file information (size, type, etc.) +- |white_check_mark| URLs, emails, authors + +Ouputs and UI +^^^^^^^^^^^^^ +- |white_check_mark| JSON compact and pretty +- |white_check_mark| plain HTML tables, also usable in a spreadsheet +- |white_check_mark| fancy HTML 'app' with a file tree navigation, and scan results filtering, search and sorting +- |white_check_mark| improved scans GUI now its own project: https://github.com/nexB/aboutcode-manager +- |white_check_mark| simple scan summary +- |white_check_mark| SPDX output + +Package and dependencies +^^^^^^^^^^^^^^^^^^^^^^^^ +- |white_check_mark| common model for packages data +- |white_check_mark| basic support for common packages format +- |white_check_mark| RPM packages base +- |white_check_mark| NuGet packages base +- |white_check_mark| Python packages base +- |white_check_mark| PHP Composer packages support with dependencies +- |white_check_mark| Java Maven POM packages support with dependencies +- |white_check_mark| npm packages support with dependencies + +Speed! +^^^^^^ +- |white_check_mark| accelerate license detection indexing and scanning; include caching +- |white_check_mark| scan using multiple processes to speed up overall scan +- |white_check_mark| cache per-file scan to disk and stream final results + +Other +^^^^^ +- |white_check_mark| archive extraction with extractcode +- |white_check_mark| conversion of scan results to CSV +- |white_check_mark| improved error handling, verbose and diagnostic output + +.. |white_check_mark| image:: done.png + :scale: 10 % +.. |white_large_square| image:: planned.png + :scale: 10 % +.. |clock1030| image:: clock.png + :scale: 10 % diff --git a/docs/source/scancode-toolkit/runtime_performance_report.rst b/docs/source/scancode-toolkit/runtime_performance_report.rst new file mode 100644 index 00000000..3c50ffdd --- /dev/null +++ b/docs/source/scancode-toolkit/runtime_performance_report.rst @@ -0,0 +1,11 @@ +Performance Report +================== + +These are reports of runtimes for real life scans: + +2015-09-03 by @rrjohnston + +- On Ubuntu 12.04 x86_64 Python 2.7.3 and ScanCode Version 1.3.1 +- Specs: 40 threads (2 processors, 10 cores each, with hyperthreading) 3.1 GHz 128GB RAM 8TB controller RAID5 +- scanned 195676 files in about 16.7 hours or about 3.25 file per second (using defaults licenses and copyrights) +- notes: this version of ScanCode runs on a single thread so it does not make good use of extra processing power. \ No newline at end of file diff --git a/docs/source/scancode-toolkit/support.rst b/docs/source/scancode-toolkit/support.rst new file mode 100644 index 00000000..8b16d25d --- /dev/null +++ b/docs/source/scancode-toolkit/support.rst @@ -0,0 +1,6 @@ +Support +======= + +Post questions and bugs as Github tickets at: https://github.com/nexB/scancode-toolkit/issues + +Ask question on StackOverflow using the [scancode] tag. \ No newline at end of file diff --git a/docs/source/scancode-workbench/building.rst b/docs/source/scancode-workbench/building.rst new file mode 100644 index 00000000..bfbe2640 --- /dev/null +++ b/docs/source/scancode-workbench/building.rst @@ -0,0 +1,36 @@ +Building Requirements +===================== + +Linux +----- + +- Python 2.7 +- `Node.js `_ 6.x or later +- npm 3.10.x or later but <= 5.2.0 (run ``npm install npm@5.2.0 -g``) + +MacOS +----- + +- Python 2.7 +- `Node.js `_ >=6.x or later but <=8.9.4 +- npm 3.10.x or later but <= 5.2.0 (run ``npm install npm@5.2.0 -g``) +- Command Line Tools for `Xcode `_ (run ``xcode-select --install to install``) + +Windows +------- + +- `Node.js `_ 6.x or later +- npm 3.10.x or later but <= 5.2.0 (``run npm install npm@5.2.0 -g``) +- Python v2.7.x + + * Make sure your Python path is set. To verify, open a command prompt and type ``python --version``. Then, the version of python will be displayed. + +- Visual C++ Build Environment: + + * Either: + + - Option 1: Install `Visual C++ Build Tools 2015 `_ (or modify an existing installation) and select Common Tools for Visual C++ during setup. This also works with the free Community and Express for Desktop editions. + - Option 2: `Visual Studio 2015 `_ (Community Edition or better) + + * Note: Windows 7 requires `.NET Framework 4.5.1 `_ + * Launch cmd, ``npm config set msvs_version 2015`` \ No newline at end of file diff --git a/docs/source/scancode-workbench/index.rst b/docs/source/scancode-workbench/index.rst new file mode 100644 index 00000000..4ddbfc81 --- /dev/null +++ b/docs/source/scancode-workbench/index.rst @@ -0,0 +1,13 @@ +**Scancode-Workbench Documentation** +==================================== + +ScanCode Workbench allows you take the scan results from the ScanCode Toolkit and create a software inventory annotated with your summaries or conclusions (we call these Conclusions) at any levels of the codebase you choose. + +The attributes you add (e.g., Name, Version, Owner, License Expression, Copyright) to your Conclusion about a single package or file -- or a higher-level group of packages and/or files -- can then be exported to a JSON or SQLite file. In addition, Conclusions created in ScanCode Workbench can be exported to `DejaCode `_. + +.. toctree:: + :maxdepth: 3 + + scancode_workbench_views + building + platform_support \ No newline at end of file diff --git a/docs/source/scancode-workbench/navigate-code-tree.gif b/docs/source/scancode-workbench/navigate-code-tree.gif new file mode 100644 index 00000000..75aa8f04 Binary files /dev/null and b/docs/source/scancode-workbench/navigate-code-tree.gif differ diff --git a/docs/source/scancode-workbench/platform_support.rst b/docs/source/scancode-workbench/platform_support.rst new file mode 100644 index 00000000..5a197962 --- /dev/null +++ b/docs/source/scancode-workbench/platform_support.rst @@ -0,0 +1,72 @@ +ScanCode Workbench Platform Support +=================================== + +Our approach for platform support is to focus on one primary release for each of Linux, MacOS and Windows. The Priority definitions are: + +#. Primary - These are the primary platforms for build/test/release on an ongoing basis. +#. Secondary - These are platforms where the primary ScanCode Workbench release for the corresponding OS Group should be forward-compatible, e.g., Windows 7 build should work on Windows 10. Issues reported and traced to a Secondary platform may not be fixed. +#. Tertiary - These are any other platforms not listed as Primary or Secondary. In these cases, we will help users help themselves, but we are likely not to fix Issues that only surface on a Tertiary platform. + ++-------------+------------------+------------+------------+--------------------------------------------+ +| OS Group | Desktop OS | Arch | Priority | Notes | +| | Version | | | | ++=============+==================+============+============+============================================+ +| Windows | Windows 7 SP1 | x64 | 1 | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Windows | Windows 10 SP? | x64 | 2 | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| MacOS | 10.9 Mavericks | x64 | 1 | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| MacOS | 10.10 Yosemite | x64 | 2 | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| MacOS | 10.11 El Capitan | x64 | 2 | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| MacOS | 10.12 Sierra | x64 | 2 | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux Deb | Ubuntu 12.04 | x64 | 1 | From Electron Docs: The prebuilt ia32 | +| | | | | (i686) and x64 (amd64) binaries of | +| | | | | Electron are built on Ubuntu 12.04. | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux Deb | Ubuntu 14.xx | x64 | 2 | Verified to be able to run the prebuilt | +| | | | | binaries of Electron. | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux Deb | Ubuntu 16.xx | x64 | 2 | Verified to be able to run the prebuilt | +| | | | | binaries of Electron. | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux | Fedora 21 | x64 | 2 | Verified to be able to run the prebuilt | +| | | | | binaries of Electron. | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux | Debian 8 | x64 | 2 | Verified to be able to run the prebuilt | +| | | | | binaries of Electron. | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux RH | CentOS 7.xx | x64 | ? | | ++-------------+------------------+------------+------------+--------------------------------------------+ +| Linux RH | RHEL 7.xx | x64 | ? | | ++-------------+------------------+------------+------------+--------------------------------------------+ + +Electron Supported Platforms +---------------------------- +https://electron.atom.io/docs/tutorial/supported-platforms/ + +The following platforms are supported by Electron: + +MacOS +^^^^^ + +Only 64-bit binaries are provided for MacOS, and the minimum MacOS version supported is MacOS 10.9. + +Windows +^^^^^^^ + +Windows 7 and later are supported, while older operating systems are not supported (and do not work). Both ia32 (x86) and x64 (amd64) binaries are provided for Windows. Please note: the ARM version of Windows is not supported for now. + +Linux +^^^^^ + +The prebuilt ia32 (i686) and x64 (amd64) binaries of Electron are built on Ubuntu 12.04, and the ARM binary is built against ARM v7 with hard-float ABI and NEON for Debian Wheezy. + +Whether the prebuilt binary can run on a distribution depends on whether the distribution includes the libraries that Electron is linked to on the building platform, so only Ubuntu 12.04 is guaranteed to work, but the following platforms are also verified to be able to run the prebuilt binaries of Electron: + +- Ubuntu 12.04 and later +- Fedora 21 +- Debian 8 \ No newline at end of file diff --git a/docs/source/scancode-workbench/scancode-workbench-chart-summary.gif b/docs/source/scancode-workbench/scancode-workbench-chart-summary.gif new file mode 100644 index 00000000..be925711 Binary files /dev/null and b/docs/source/scancode-workbench/scancode-workbench-chart-summary.gif differ diff --git a/docs/source/scancode-workbench/scancode_workbench_views.rst b/docs/source/scancode-workbench/scancode_workbench_views.rst new file mode 100644 index 00000000..9f9ccb7c --- /dev/null +++ b/docs/source/scancode-workbench/scancode_workbench_views.rst @@ -0,0 +1,23 @@ +Scancode Workbench Views +======================== + +Directory Tree +-------------- + +An interactive directory tree is always present on the left side of the application. The tree is expandable and collapsible. This allows the user to navigate the codebase structure. If a directory is selected, only that directory and its sub-files and folders will be shown in the view. Similarly, if a single file is selected, only information for that selected file will be shown. + +.. image:: navigate-code-tree.gif + +Table View +---------- + +In the table view, the available clues detected by `ScanCode `_ are shown in a tabular format. A user can see provenance clues such as license and copyright information detected by ScanCode. A user can also see the file information (e.g. file type, file size, etc) and package information (package type, primary language of package) that was detected. The columns can be sorted as well as shown or hidden based on what the user’s preferences. Searching for specific clues (license names, copyrights, etc.) is also available in this view. + +.. image:: table-view.gif + +Chart Summary View +------------------ + +With the chart summary view, a user can select a node in the directory tree (i.e., a directory, folder or file) and display a horizontal bar chart listing the values identified in the scanned codebase -- that is, the clues detected by ScanCode Toolkit -- for a number of different attributes. The attributes are a subset of the columns displayed in the table view, and can be selected by clicking the dropdown at the top of the view. The chart displays the full range of values for the selected directory tree node and attribute and the number of times each value occurs in the scanned codebase. + +.. image:: scancode-workbench-chart-summary.gif \ No newline at end of file diff --git a/docs/source/scancode-workbench/table-view.gif b/docs/source/scancode-workbench/table-view.gif new file mode 100644 index 00000000..1d5335f1 Binary files /dev/null and b/docs/source/scancode-workbench/table-view.gif differ