## Importing Event Registry module

In order to use Event Registry, you have to import the module called `eventregistry`.
To install the module call

```
pip install eventregistry
```

In [44]:
from eventregistry import *
import json, os, sys

There is one main class that interacts with Event Registry service and it is called `EventRegistry`. The class accepts also input parameter `apiKey`, which you need to supply in order to make more than a trivial number of requests. If you want to avoid always providing the `apiKey` you can also create a `settings.json` file and store it in the same folder where the module is installed. The file should simply contain:

```
{
    "apiKey": "..."
}
```

In [45]:
er = EventRegistry(allowUseOfArchive=False)

found apiKey in settings file which will be used for making requests
Event Registry host: http://eventregistry.org
Text analytics host: http://analytics.eventregistry.org


## A few example queries

Getting the most recent articles about Donald Trump or Boris Johnson written by New York Times on the topic of Business:

In [48]:
q = QueryArticlesIter(
    keywords = QueryItems.OR(["Donald Trump", "Boris Johnson"]),
    sourceUri= er.getSourceUri("New York Times"),
    categoryUri = er.getCategoryUri("Business"))

print("Number of results: %d" % q.count(er))

for art in q.execQuery(er, sortBy = "date", maxItems = 2):
    print(json.dumps(art, indent=4))

Number of results: 66
{
    "uri": "1250531359",
    "lang": "eng",
    "isDuplicate": false,
    "date": "2019-09-09",
    "time": "10:44:00",
    "dateTime": "2019-09-09T10:44:00Z",
    "dataType": "news",
    "sim": 0.5921568870544434,
    "url": "https://www.nytimes.com/2019/09/09/business/dealbook/mit-media-lab-jeffrey-epstein.html",
    "title": "DealBook Briefing: How an M.I.T. Lab Hid Links to Jeffrey Epstein",
    "body": "Good Monday morning. Programming note: I'm going to be in conversation with Blackstone's Stephen Schwarzman about his new book, \"What It Takes: Lessons in the Pursuit of Excellence,\" the global economy and philanthropy for a DealBook TimesTalk in New York on Sept 16. Get your tickets here. (Was this email forwarded to you? Sign up here.)\n\nEpstein ties cost an M.I.T. official his job\n\nJoichi Ito, the head of M.I.T.'s Media Lab, resigned on Saturday after The New Yorker published an investigation into yearslong efforts by the research lab to disguise its

The list of articles in Chinese or Arabic articles about Apple:

In [39]:
q = QueryArticlesIter(
    conceptUri = er.getConceptUri("Apple"),
    lang = QueryItems.OR(["ara", "zho"]))

for art in q.execQuery(er, sortBy = "date", maxItems = 2):
    print(json.dumps(art, indent=4))

{
    "uri": "1244491882",
    "lang": "ara",
    "isDuplicate": false,
    "date": "2019-09-08",
    "time": "18:12:00",
    "dateTime": "2019-09-08T18:12:00Z",
    "dataType": "news",
    "sim": 0,
    "url": "https://www.youm7.com/story/2019/9/8/\u0645\u0633\u062a\u062e\u062f\u0645\u0648-\u0647\u0630\u0647-\u0627\u0644\u0647\u0648\u0627\u062a\u0641-\u0623\u0643\u062b\u0631-\u0627\u0644\u0645\u062a\u0636\u0631\u0631\u064a\u0646-\u0645\u0646-\u0625\u0637\u0644\u0627\u0642-\u0623\u064a\u0641\u0648\u0646-11-\u0627\u0644\u062c\u062f\u064a\u062f/4408241",
    "title": "\u0645\u0633\u062a\u062e\u062f\u0645\u0648 \u0647\u0630\u0647 \u0627\u0644\u0647\u0648\u0627\u062a\u0641 \u0623\u0643\u062b\u0631 \u0627\u0644\u0645\u062a\u0636\u0631\u0631\u064a\u0646 \u0645\u0646 \u0625\u0637\u0644\u0627\u0642 \u0623\u064a\u0641\u0648\u0646 11 \u0627\u0644\u062c\u062f\u064a\u062f.. \u0627\u0639\u0631\u0641\u0647\u0645 - \u0627\u0644\u064a\u0648\u0645 \u0627\u0644\u0633\u0627\u0628\u0639",
    "body": "\u0

Largest recent events on the topic of Brexit:

In [43]:
q = QueryEventsIter(keywords = "Brexit")

for event in q.execQuery(er, sortBy = "size", maxItems = 1):
    print(json.dumps(event, indent=4))
    

{
    "uri": "eng-5028293",
    "concepts": [
        {
            "uri": "http://en.wikipedia.org/wiki/Boris_Johnson",
            "type": "person",
            "score": 100,
            "label": {
                "eng": "Boris Johnson"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Brexit",
            "type": "wiki",
            "score": 86,
            "label": {
                "eng": "Brexit"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/Parliament",
            "type": "wiki",
            "score": 79,
            "label": {
                "eng": "Parliament"
            }
        },
        {
            "uri": "http://en.wikipedia.org/wiki/United_Kingdom",
            "type": "loc",
            "score": 71,
            "label": {
                "eng": "United Kingdom"
            },
            "location": {
                "type": "country",
                "label": {
                    "eng": "Unit

## Auto-suggestion methods
Several API calls accept parameters that are unique identifiers - examples of such parameters are concepts, categories and sources. If you just know a pretty name or a label of such parameter, then you can use the auto-suggest methods to obtain the unique identifier for the parameter.

If you know that there is a category for Investing, then you can get the URI for it like this:

In [15]:
er.getCategoryUri("investing")

'dmoz/Business/Investing'

Similarly, if you want some source name, but don't know exactly the domain name of the source, you can use the name like this:

In [18]:
er.getSourceUri("new york times")

'nytimes.com'

For concepts, the URIs are URLs of the corresponding wikipedia pages. 

In [20]:
er.getConceptUri("Obama")

'http://en.wikipedia.org/wiki/Barack_Obama'

The autosuggestion works even for the concept tickers:

In [46]:
er.getConceptUri("AAPL")

'http://en.wikipedia.org/wiki/Apple_Inc.'

## Searching for articles

There are two classes that can be used for searching for articles - `QueryArticlesIter` and `QueryArticles`. Use `QueryArticlesIter` when you simply want to download articles matching a query. `QueryArticles` can instead be used when you need to download various summaries of the results, like top concepts, top sources, top authors, etc.

Both classes allow you to specify in the constructor several filters, such:
- `keywords` - find articles that mention the keywords or phrases
- `conceptUri` - find articles that mention the concept(s)
- `categoryUri` - find articles that are about category(s)
- `sourceUri` - find articles written by the given publisher(s)
- `sourceLocationUri` - find articles written by publishers located in the given location
- `authorUri` - find articles written by the given author(s)
- `locationUri` - find articles that mention the given location in the dateline
- `lang` - find articles written in the given language
- `dateStart` - find articles that were written on the given date or later (in the `YYYY-MM-DD` format)
- `dateEnd` - find articles that were written before or on the given date (in the `YYYY-MM-DD` format)
- `keywordsLoc` - if keywords are provided, where should we search for the keyword (`title` or `body` (default))
- `minSentiment`, `maxSentiment` - min and max value of the sentiment (from -1 to 1)
- `startSourceRankPercentile` - starting percentile of the sources to consider in the results (default: 0). Value should be in range 0-90 and divisible by 10.
- `endSourceRankPercentile` - ending percentile of the sources to consider in the results (default: 100). Value should be in range 10-100 and divisible by 10.
- `ignoreKeywords`, `ignoreConceptUri`, `ignoreCategoryUri`, ... - from the articles that match the rest of the conditions, exclude the articles that match any of the provided filters
- `dataType` - which data types should be included in the results - `news` (default), `blog` or `pr`

When multiple filters are specified, the results have to match **all** of the provided filters. For example, when keywords and sources are specified, the results will be articles written by these sources that mention the provided keywords.

If you'll want to make a search, where **any** of the specified filtes are true, you'll have to use the [Advanced Query Language](https://github.com/EventRegistry/event-registry-python/wiki/Searching-for-articles#advanced-query-language)

### Using `QueryItems.AND()` and `QueryItems.OR()` when providing a list of filters of same type
When you want to provide several keywords, concepts, categories, etc., you have to explicitly determine whether you'd like that the results mention **all** of them, or **any** of them.

To do that, you can use the `QueryItems.AND()` and `QueryItems.OR()` methods

In [47]:
q = QueryArticlesIter(keywords = QueryItems.OR(["Samsung", "Apple", "Google"]))
print("Count with any of the companies: %d" % q.count(er))

q = QueryArticlesIter(keywords = "Samsung")
print("Count mentioning Samsung: %d" % q.count(er))

Count with any of the companies: 216551
Count mentioning Samsung: 38654


### Retrieving different properties about articles
When retrieving articles, you can retrieve a lot of properties. Some properties are not returned by default, such as list of mentioned *concepts, categories, links, videos*, etc. 

To modify which properties to return, use specify the `returnInfo` parameter of type `ReturnInfo`. With `ReturnInfo` you can specify which parameters will be returned for all available returned objects, like articles, concepts, categories, events, ...

```QueryArticlesIter(..., returnInfo = ReturnInfo(...))```

The detailed description of `ReturnInfo` and available parameters are described [here](https://github.com/EventRegistry/event-registry-python/wiki/ReturnInfo-class).

In [53]:
retInfo = ReturnInfo(
    articleInfo = ArticleInfoFlags(),             # details about the articles to return
    eventInfo = EventInfoFlags(),                 # details about the events to return
    sourceInfo = SourceInfoFlags(),               # details about the news sources to return
    categoryInfo = CategoryInfoFlags(),           # details about the categories to return
    conceptInfo = ConceptInfoFlags(),             # details about the concepts to return
    locationInfo = LocationInfoFlags(),           # details about the locations to return
    storyInfo = StoryInfoFlags(),                 # details about the stories to return
    conceptClassInfo = ConceptClassInfoFlags(),   # details about the concept classes to return
    conceptFolderInfo = ConceptFolderInfoFlags()) # details about the concept folders to return

An example query that will return list of concepts, categories, source location, and a list of potential duplicates of the article:

In [52]:
q = QueryArticlesIter(keywords = "Trump", sourceUri = "nytimes.com")
for art in q.execQuery(er, 
        maxItems = 1,
        returnInfo = ReturnInfo(
            articleInfo=ArticleInfoFlags(concepts=True, categories=True, duplicateList=True, location=True, bodyLen=300),
            sourceInfo=SourceInfoFlags(location=True, image=True)            
          )):
    print(json.dumps(art, indent=4))

{
    "uri": "1243385784",
    "lang": "eng",
    "isDuplicate": false,
    "date": "2019-09-07",
    "time": "16:16:00",
    "dateTime": "2019-09-07T16:16:00Z",
    "dataType": "news",
    "sim": 0.4705882370471954,
    "url": "https://www.nytimes.com/2019/09/07/us/politics/trump-hotel.html",
    "title": "Checking In at Trump Hotels, for Kinship (and Maybe Some Sway)",
    "body": "WASHINGTON -- At a table in the lobby bar of the Trump International Hotel this week, the final details of a black-tie, 40th anniversary gala for the Concerned Women for America were being worked out by the conservative group's staff.\n\nThere was the contract with the president's hotel to be ...",
    "source": {
        "uri": "nytimes.com",
        "dataType": "news",
        "title": "The New York Times",
        "location": {
            "type": "place",
            "label": {
                "eng": "New York City"
            },
            "country": {
                "type": "country",
            