### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

In [50]:
import requests
import pandas as pd
import numpy as np

In [112]:
#english only documents

In [113]:
## api key for News API
news_api_key = "b6d1e4ad43564755af9a28ec02ba47f4"

In [118]:
#define subjects
list_of_subjects = ["Israel", "Immigration", "Elections", "NFL", "United Nations", "Gaza", "Biden", "Trump", "Republicans", "Democrats"]

columns = ['subject', 'source_id', 'name', 'author', 'title', 'url', 'urlToImage', 'publishedAt', 'content']

#create empty dataframe to fill
df_original = pd.DataFrame(columns=columns)
    
#loop over subjects and get 100 requests

for subject in list_of_subjects:
    

    #define url
    url = (f"https://newsapi.org/v2/everything?"
               f"q={subject}&"
               f"language=en&"
               f"apiKey={news_api_key}")

    response = requests.get(url)
    response_json = response.json()
    
    
    list_of_response = []

    for response in response_json['articles']:

        #add subject
        subject = subject
        
        #get individual columns from data    
        source_id = response['source']['id']
        name = response['source']['name']
        author = response['author']
        title = response['title']
        url = response['url']
        urlToImage = response['urlToImage']
        publishedAt = response['publishedAt']
        content = response['content']

        #add to list
        list_of_response.append([subject, source_id, name, author, title, url, urlToImage, publishedAt, content])

    df_1 = pd.DataFrame(list_of_response, columns=columns)
        
    #add single df to multiple df
    df_original = df_original.append(df_1)

  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)
  df_original = df_original.append(df_1)


In [119]:
df_original.head()

Unnamed: 0,subject,source_id,name,author,title,url,urlToImage,publishedAt,content
0,Israel,bbc-news,BBC News,https://www.facebook.com/bbcnews,"Hamas command in north Gaza destroyed, Israel ...",https://www.bbc.co.uk/news/world-middle-east-6...,https://ichef.bbci.co.uk/news/1024/branded_new...,2024-01-06T20:36:02Z,"The Israeli army says it has ""completed the di..."
1,Israel,bbc-news,BBC News,https://www.facebook.com/bbcnews,South Africa's genocide case against Israel: B...,https://www.bbc.co.uk/news/world-middle-east-6...,https://ichef.bbci.co.uk/news/1024/branded_new...,2024-01-12T20:59:40Z,The UN's top legal body has now heard two days...
2,Israel,bbc-news,BBC News,https://www.facebook.com/bbcnews,UN court to hear South Africa genocide case ag...,https://www.bbc.co.uk/news/world-middle-east-6...,https://ichef.bbci.co.uk/news/1024/branded_new...,2024-01-11T07:16:13Z,The UN's International Court of Justice will b...
3,Israel,bbc-news,BBC News,https://www.facebook.com/bbcnews,Khan Younis: Israel says forces have encircled...,https://www.bbc.co.uk/news/world-middle-east-6...,https://ichef.bbci.co.uk/news/1024/branded_new...,2024-01-23T21:15:25Z,The Israeli military says its ground forces ha...
4,Israel,bbc-news,BBC News,https://www.facebook.com/bbcnews,Israeli military says 21 soldiers killed in Gaza,https://www.bbc.co.uk/news/world-middle-east-6...,https://ichef.bbci.co.uk/news/1024/branded_new...,2024-01-23T07:00:02Z,The Israeli army says 21 of its soldiers have ...


In [120]:
df_original.shape

(1000, 9)

In [121]:
df_original.to_excel("news_2024_02_02.xlsx", index=False)