### Converting Queries: Migrating Workflows from Factiva.com to Factiva Analytics

One common use case for Factiva Analytics is automating an existing Factiva.com workflow. In order to successfully perform such a migration, it is important to understand the differences between these two products

### Content Sets

Factiva.com includes ~30k sources of a wide variety of types:

- Newspapers
- Newswires
- Magazines & Journals
- Research reports
- Government publications
- Think Tank and NGO publications
- Newsletters
- Blogs **(Not in Factiva Analytics)**
- Transcripts of television and radio programs **(Not in Factiva Analytics)**

Web scraped blog content, transcripts, and certain other content isn't included in the ~10k sources available in Factiva Analytics. 

If you are a Media Monitoring Organization, redistributing content to external clients, or otherwise working in partnership with Dow Jones to provide Factiva content to third parties, you may have a further limited content set (please speak to your Dow Jones account rep for further information.)

Because of this difference, searches in factiva.com may not match what you find in Factiva Analytics.

### Converting Searches

Factiva.com uses a different query language to Factiva Analytics. Factiva Analytics uses a SQL-like query language based on the data definitions shown [here](https://developer.dowjones.com/site/docs/factiva_apis/factiva_analytics_apis/factiva_snapshots_api/index.gsp#datadefinitions-2)

You can read more about basic Factiva Analytic searches [here](https://github.com/dowjones/factiva-sample-notebooks/blob/master/2.1_complex_large_queries.ipynb)

### Stream queries are slightly different

You'll notice a lot of the Factiva Analytic where clauses use `REGEX` to perform text searches. There is a slight different in syntax between Snapshot queries and Streams queries.

A Snapshot REGEX query will be in the below form:

```sql
REGEXP_CONTAINS(column_name, r'<regex_string>')
```

While a Stream REGEX query will be in this form:

```sql
REGEXP_LIKE(column_name, '<regex_string>')
```

Where clauses can be converted from a snapshot query to stream query using the below string replace:

```python
where = where.replace('REGEXP_CONTAINS', 'REGEXP_LIKE').replace('r\'', '\'')
```

### Common conversions

Below you'll find some common query conversions

### Range of dates

**FQL**: This is usually defined in the date dropdown. `Last 3 months` for example.

**SQL**: This works with standard SQL syntax. 

```sql
publication_datetime >= '2010-01-01 00:00:00'
```

### Filter by `source_codes`

**FQL**: 

```
rst=(sfwsj or sflefig or sfechos)
```

**SQL (Snapshot)**:

```sql
REGEXP_CONTAINS(restrictor_codes, r'(?i)(^|,)(sfwsj|sflefig|sfechos|sfnyt)($|,)')
```

**SQL (Stream)**:

```sql
REGEXP_LIKE(restrictor_codes, '(?i)(^|,)(sfwsj|sflefig|sfechos|sfnyt)($|,)')
```



### Filtering by `language_code`

**FQL**:

```
la=(en or es or it)

```

**SQL**:

```sql
LOWER(language_code) IN ('en', 'es', 'it')
```

### Filtering by `company_code_x`

**FQL**: FQL provides only one way to search for company codes:

```
fds=(ONLNFR or APPLC or NETFLI or GOOG)
```

**SQL**: In the Factiva Analytics field names, you'll see multiple types of company code fields, along with the ability to search directly by `ISIN` or `CUSIP`. Two of the most popular to use are `company_codes_about`, which tag articles based on whether the article is primarily about the company, or `company_codes_occur`, which tag articles based on whether the company is mentioned at all (thus casting a wider net.)

**SQL (Snapshot)**:

```sql
REGEXP_CONTAINS(company_codes_about, r'(?i)(^|,)(onlnfr|applc|netfli|goog)($|,)')
```

**SQL (Stream)**:

```sql
REGEXP_LIKE(company_codes_about, '(?i)(^|,)(onlnfr|applc|netfli|goog)($|,)')
```

### Filtering by `industry_codes`

**FQL**:

```
in=(i1 or i25121 or i25121 or i2567)
```

**SQL (Snapshot)**:

```sql
REGEXP_CONTAINS(industry_codes, r'(?i)(^|,)(i1|i25121|i2567)($|,)')
```

**SQL (Stream)**:

```sql
REGEXP_LIKE(industry_codes, '(?i)(^|,)(i1|i25121|i2567)($|,)')
```

### Filtering by the **region** the article is **about** (`region_codes`)

**FQL**:

```
re=(aust or spain or italy or usa or uk)
```

**SQL (Snapshot)**:

```sql
REGEXP_CONTAINS(region_codes, r'(?i)(^|,)(aust|spain|italy|usa|uk)($|,)')
```

**SQL (Stream)**:

```sql
REGEXP_LIKE(region_codes, '(?i)(^|,)(aust|spain|italy|usa|uk)($|,)')
```

### Filtering by the number of words (`word_count`)

**FQL**:

```
wc >= 250
```

**SQL**:

```sql
word_count >= 250
```

### Keyword searches

Translating keyword searches is where the differences can become the most stark.

##### Searching in different article locations

Factiva.com allows you to perform free text searches, or specify where you'd like to search (`hd` for headline, `lp` for lead paragraph, or `hlp` for both for example.) 

Factiva Analytics takes care of these differences by separating the article into different fields. Every article includes a `title`, which corresponds to the headline, `snippet`, which corresponds to the lead paragraph, and `body` which corresponds to the rest of the article not including the `snippet`. You can use `REGEX` to search through these fields.

##### Mimicking Factiva.com operators

Here are how to reproduce some common Factiva.com operators

| Operator | FQL | REGEX |
| ----------- | ----------- |----------- |

