# API

In [None]:
import arxiv

search = arxiv.Search(
    query = "ti:computational AND ti:social AND ti:science",
    max_results = 20,
    sort_by = arxiv.arxiv.SortCriterion.Relevance
)

for result in search.results():
  print(result.title)

# Web 

Sometimes there's no API available and we need to scrape some webpage by yourself. 

To do that we need to understand the basics of webpages. 

<div style="text-align: center"><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/61/HTML5_logo_and_wordmark.svg/1200px-HTML5_logo_and_wordmark.svg.png" style="width:100px;height:120px;"></div>

## Html


- HTML is the standard markup language for Web pages.
- With HTML you can create your own Website.
- HTML is not difficult, but it can be messy in real-world application.


## Example

+ This is a simple html document

    ```Html
    <!DOCTYPE html>
    <html>
    <head>
    <title>Page Title</title>
    </head>
    <body>

    <h1>This is a Heading</h1>
    <p>This is a paragraph.</p>

    </body>
    </html>
    ```


 - In the web browser, the above codes would be displayed like:

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>This is a Heading</h1>
<p>This is a paragraph.</p>

</body>
</html>

- Or you can try it online: 
   [link](https://www.w3schools.com/html/tryit.asp?filename=tryhtml_default)

- Or edit a html file on your computer

    - Step 1: Open any text editor
    - Step 2: Write or copy the following HTML code into Notepad:

    ```Html
    <!DOCTYPE html>
    <html>
    <body>

    <h1>My First Heading</h1>

    <p>My first paragraph.</p>

    </body>
    </html>
        ```
    - Step 3: Save the HTML Page
        - Name the file "index.htm" and set the encoding to UTF-8 (which is the preferred encoding for HTML files).
        
    - Step 4: View the HTML Page in Your Browser
        - Open the saved HTML file in your favorite browser (double click on the file, or right-click - and choose "Open with").

        - The result will look much like this:
        <div style="text-align: center"><img src="https://www.w3schools.com/html/img_chrome.png" style="width:500px;height:200px;"></div>
        


## Elements

- An HTML element is defined by a start tag, some content, and an end tag:
```Html
<tagname>Content goes here...</tagname>
```

- The HTML element is everything from the start tag to the end tag:
```html
<h1>My First Heading</h1>
<p>My first paragraph.</p>
```


|  Start tag | Element content   | End tag  |
|---|---|---|
|```<h1>```|	My First Heading|	```</h1>```|
|```<p>```|My first paragraph. |	```</p>```|


## CSS


- Cascading Style Sheets (CSS) is used to format the layout of a webpage.

- With CSS, you can control:
    - the color, 
    - font, 
    - the size of text,
    - the spacing between elements, 
    - how elements are positioned and laid out,
    - what background images or background colors are to be used, 
    - different displays for different devices and screen sizes
    - ...

- Internal CSS
    - display head line in blue and paragraph in red 

    ```

    <h1 style="color:blue;">A Blue Heading</h1>

    <p style="color:red;">A red paragraph.</p>

    ```

<h1 style="color:blue;">A Blue Heading</h1>

<p style="color:red;">A red paragraph.</p>


## JavaScript

- JavaScript makes HTML pages more dynamic and interactive.

- open the "clock.html" file under the HTML folder using your browser



## Simple web scraping

In [None]:
import requests
from bs4 import BeautifulSoup

URL = "https://arxiv.org/search/?query=computational+social+science&searchtype=title&source=header"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
print (soup)

In [None]:
titles = soup.find_all('p', {'class':'title is-5 mathjax'})

In [None]:
for title in titles: 
    print(title.text)

A useful tutorial for further study: https://realpython.com/beautiful-soup-web-scraper-python/