Permalink
Browse files

cEP 9: Extract and use information

Extract and utilize information from project files.
  • Loading branch information...
satwikkansal committed May 25, 2017
1 parent 3c5ec97 commit eb4f194fe8b4cdede65061c08f880c2880936332
Showing with 209 additions and 0 deletions.
  1. +209 −0 cEP-0009.md
View
@@ -0,0 +1,209 @@
| Metadata | |
| ------------ |----------------------------------------------------------|
| cEP | 9 |
| Version | 0.1 |
| Title | Extract and utilize information from project files |
| Authors | Satwik Kansal <satwikkansal@gmail.com> |
| Status | Proposed |
| Type | Feature |
## Abstract
There are many kinds of project files like ".editorconfig", "Gruntfile.js", etc. that contain meta-information that can used by tools like coala-quickstart to provide better analysis. This cEP proposes a mechnaism to represent and extract information from project files and an interface to utilize the extracted information.
## Proposed Approach
### 1. Implementation of the “Info” class
A subclass of `Info` class will represent a specific kind of information. For example, the class `IgnorePathInfo` may represent “ignored file paths”.
```py
class Info:
description = "Some information"
# for validation
value_type = ("data-type of the information value, can also be an ``Enum``"
"of possible choices.")
def __init__(self,
source,
value: value_type
scope=None):
"""
:param source: Source from where the information is discovered.
:param value: Value of the infomation to be represented.
:param scope: Scope of the applicability of the information. Some
of the values may be "bear-specific" or "global".
"""
self.source = source
self._value = value
# optional metadata
self.scope = scope
@property
def value(self):
return self._value
@property
def name(self):
return self.__class__.__name__
```
### 2. Implementation of “InfoExtractor” class
Any derived class of the `InfoExtractor` abstract class consists of:
- Methods for parsing the project file.
- Methods to find information in the parsed file.
- A collection of `Info` objects representing the information extracted from a particular kind of project file.
- Files to parse and extract information from.
```py
from coalib.parsing.Globbing import glob
class InfoExtractor:
# Kinds of files supported by the InfoExtractor
supported_files = []
# Meta-data
# A list of ``Info`` classes relevant to the class
supported_information_kinds = []
def __init__(self,
target_files,
project_directory):
"""
:param target_files: list of file globs to extract information
from.
:param project_directory: Absolute path to project directory in which
the target files will be searched.
"""
self.target_files = target_files
self.directory = project_directory
self._information = dict()
@property
def information(self):
"""
Return extracted information (if any)
"""
return self._information
def parse_file(self, filename):
"""
Parses the given file and returns the parsed file.
"""
raise NotImplementedError
def _add_info(self, info):
"""
Method to organize and add the supplied information in self.information attribute.
:param info: Collection of ``Info`` instances.
"""
pass
def extract_information(self):
"""
Extracts the information, saves in the object and returns it.
"""
filenames = retrieve_files(self.target_files)
for fname in filenames:
pfile = self.parse_file(fname)
file_info = find_information(pfile)
if file_info:
self._add_info(fname, file_info)
return self.information
@staticmethod
def find_information(self, fname, parsed_file):
"""
Returns a collection of ``Info`` instances.
"""
raise NotImplementedError
@staticmethod
def retrieve_files(file_globs, directory):
"""
Returns matched filenames acoording to the list of file globs and
supported files of the extractor.
"""
matched_files = []
for g in file_globs:
matched_files.append(glob(g))
return matched_files
```
## Example
```py
import json
class LicenseUsedInfo(Info):
description = "License of the project."
value_type = Enum('MIT', 'Apache 2.0', 'GNU GPL')
class PackageJSONInfoExtractor(InfoExtractor):
supported_files = "package.json"
supported_information_kinds = [LicenseUsedInfo]
def parse_file(self, filename):
return json.loads(filename)
def find_information(self, fname, parsed_file):
if parsed_file.get("license"):
return LicenseUsedInfo(
fname,
parsed_file["license"])
project_dir = "any_project_dir"
pjson_info = PackageJSONInfoExtractor(
"package.json",
project_dir).extract_information()
license_used = pjson_info.get("LicenseUsedInfo")
```
Another example
```py
class LinterUsedInfo(Info):
description = "Linter used in project."
value_type = str
def __init__(self,
source,
value: value_type,
is_supported_by_coala=True):
super().__init__(source, value)
self.is_supported_by_coala = is_supported_by_coala
class GruntfileExtractor(InfoExtractor):
def parse_file(self, filename):
# parse the JS file using some library function
parsed_file = parse_javascript(filename)
return parsed_file
def find_information(self, fname, parsed_file):
# extract lint subtasks that coala supports from parsed Gruntfile.js
linters_used = ["jshint", "csslint"]
return [LinterUsedInfo(fname, l, True) for l in linters_used]
gfe = GruntfileExtractor("Gruntfile.js", project_dir)
gfe.extract_information()
coala_supported_linters = gfe.information.get("LinterUsedInfo")
```
## Possible Applications
This mechanism can be used by coala-quickstart to automatically extract some of the setting values for the bears and also to suggest project-relevant bears.
Related Links:
- http://projects.coala.io/#/projects?project=enhance_coala-quickstart to extract information
- https://github.com/coala/coala-quickstart/issues/79
- https://github.com/coala/coala-quickstart/issues/42

0 comments on commit eb4f194

Please sign in to comment.