<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Notion - Get page
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/template.ipynb" target="_parent">
<img src="https://img.shields.io/badge/-Open%20in%20Naas-success?labelColor=000000&logo="/>
</a>

## Input

In [1]:
import requests
import pandas as pd
import json

TOKEN_API = 'secret_R1CrUGn8bx9itbJW0Fc9Cc0R9Lmhbnz2ayqEe0GhRPq'
PAGE_URL = 'https://www.notion.so/Tom-Simon-2ccdafe28955478b8c9d70bda0044c86'
_VERSION = '2021-05-13'

## Model

In [2]:
def create_headers(token_api, version):
    return {
            'Authorization': f'Bearer {token_api}',
            'Notion-Version': f'{version}',
        }

create_headers(TOKEN_API, _VERSION)

{'Authorization': 'Bearer secret_R1CrUGn8bx9itbJW0Fc9Cc0R9Lmhbnz2ayqEe0GhRPq',
 'Notion-Version': '2021-05-13'}

In [3]:
def get_id_from_url(database_url):
    return database_url.split('-')[-1]

get_id_from_url(PAGE_URL)

'2ccdafe28955478b8c9d70bda0044c86'

### get properties
i use the function implemented in `get_database.ipynb`

I generate 2 kind of fct to get the page properties:
- simple: `create_simple_result`
- detailed : `create_detailed_result`

In [4]:
def get_raw_properties(token_api, page_url):
    page_id = get_id_from_url(page_url)
    url = f'https://api.notion.com/v1/pages/{page_id}'
    headers = create_headers(token_api, _VERSION)
    response = requests.get(url, headers=headers)
    print(response.status_code)
    return response.json()

page = get_raw_properties(TOKEN_API, PAGE_URL)

200


In [5]:
def extract_text(dictionnary):
    if 'name' in dictionnary:
        return dictionnary['name']
    elif 'plain_text' in dictionnary:
        return dictionnary['plain_text']
    else:
        return ''

def extract_date(dictionnary):
    '''
    For the moment we extract only the starting date of a date field
    Example {'id': 'prop_1', 'type': 'date', 'date': {'start': '2018-03-21', 'end': None}}
    '''
    return dictionnary['start']
    
def extract_data(element):
    ''' 
    input: a dictionnary of a notion property
    Exemple: {'id': 'W#4k', 'type': 'select', 'select': {'id': 'b305bd26-****-****-****-c78e2034db8f', 'name': 'Client', 'color': 'green'}}
    output: the string containing the information of the dict. (Client in the exemple)
    '''
    if type(element) is dict:
        dict_type = element['type'] 
        informations = element[dict_type]

        if type(informations) is dict:
            if dict_type == 'date':
                return extract_date(informations)
            else:
                return extract_text(informations)
        
        elif type(informations) is list:
            informations = [extract_text(elm) for elm in informations]
            return ','.join(informations)
        else:
            return informations
    else:
        return ''

1. a very simple form of what we can do:

In [6]:
def create_simple_result(dictionary):
    raw_properties = dictionary['properties']
    return {key: extract_data(elm) for key,elm in raw_properties.items()}

create_simple_result(page)

{'Date': '2021-08-27',
 'Status': 'Completed',
 'Jobs': 'Rêveur  🚀,Savant fou',
 'Completion Time': 1,
 'Email': 'tom.simon@yahoo.com',
 'Interviewer': 'Axel Rasse',
 'Task': 'Offline Mode',
 'Name': 'Tom Simon'}

2. we can also make the chose to create something with more details:

In [7]:
def create_detailed_result(dictionary):
    result = dictionary.copy()
    result['properties'] = create_simple_result(dictionary)
    result.pop('url')
    result.pop('object')
    return result

from pprint import pprint
detailed_result = create_detailed_result(page)
pprint(detailed_result)


{'archived': False,
 'created_time': '2021-08-04T12:28:00.000Z',
 'id': '2ccdafe2-8955-478b-8c9d-70bda0044c86',
 'last_edited_time': '2021-08-05T09:54:00.000Z',
 'parent': {'database_id': 'd0bb915c-4cb4-422a-8767-9f3bb9658282',
            'type': 'database_id'},
 'properties': {'Completion Time': 1,
                'Date': '2021-08-27',
                'Email': 'tom.simon@yahoo.com',
                'Interviewer': 'Axel Rasse',
                'Jobs': 'Rêveur  🚀,Savant fou',
                'Name': 'Tom Simon',
                'Status': 'Completed',
                'Task': 'Offline Mode'}}


let's mix everything in a function and use the detailed results 

In [8]:
def get_page_properties(token_api, page_url):
    page = get_raw_properties(token_api, page_url)
    return create_detailed_result(page)

get_page_properties(TOKEN_API, PAGE_URL)

200


{'id': '2ccdafe2-8955-478b-8c9d-70bda0044c86',
 'created_time': '2021-08-04T12:28:00.000Z',
 'last_edited_time': '2021-08-05T09:54:00.000Z',
 'parent': {'type': 'database_id',
  'database_id': 'd0bb915c-4cb4-422a-8767-9f3bb9658282'},
 'archived': False,
 'properties': {'Date': '2021-08-27',
  'Status': 'Completed',
  'Jobs': 'Rêveur  🚀,Savant fou',
  'Completion Time': 1,
  'Email': 'tom.simon@yahoo.com',
  'Interviewer': 'Axel Rasse',
  'Task': 'Offline Mode',
  'Name': 'Tom Simon'}}

### get content

In [9]:
def get_content(token_api, page_url):
    page_id = get_id_from_url(page_url)
    url = f'https://api.notion.com/v1/blocks/{page_id}/children'
    headers = create_headers(token_api, _VERSION)
    response = requests.get(url, headers=headers)
    
    print(response.status_code)
    return response.json()['results']

content = get_content(TOKEN_API, PAGE_URL)

200


## Output

In [10]:
# I use the simple function 
get_page_properties(TOKEN_API, PAGE_URL)

200


{'id': '2ccdafe2-8955-478b-8c9d-70bda0044c86',
 'created_time': '2021-08-04T12:28:00.000Z',
 'last_edited_time': '2021-08-05T09:54:00.000Z',
 'parent': {'type': 'database_id',
  'database_id': 'd0bb915c-4cb4-422a-8767-9f3bb9658282'},
 'archived': False,
 'properties': {'Date': '2021-08-27',
  'Status': 'Completed',
  'Jobs': 'Rêveur  🚀,Savant fou',
  'Completion Time': 1,
  'Email': 'tom.simon@yahoo.com',
  'Interviewer': 'Axel Rasse',
  'Task': 'Offline Mode',
  'Name': 'Tom Simon'}}

In [11]:
# get pages content
get_content(TOKEN_API, PAGE_URL)

200


[{'object': 'block',
  'id': '5f61a979-86b3-424a-ace5-163dd8097967',
  'created_time': '2021-08-04T12:28:00.000Z',
  'last_edited_time': '2021-08-05T09:54:00.000Z',
  'has_children': False,
  'type': 'heading_1',
  'heading_1': {'text': [{'type': 'text',
     'text': {'content': 'User Feedback', 'link': None},
     'annotations': {'bold': True,
      'italic': True,
      'strikethrough': True,
      'underline': True,
      'code': True,
      'color': 'default'},
     'plain_text': 'User Feedback',
     'href': None}]}},
 {'object': 'block',
  'id': '42ad187b-ec7a-4cb7-808d-d42189cb348d',
  'created_time': '2021-08-04T12:28:00.000Z',
  'last_edited_time': '2021-08-04T12:28:00.000Z',
  'has_children': False,
  'type': 'bulleted_list_item',
  'bulleted_list_item': {'text': []}},
 {'object': 'block',
  'id': 'fdea2eac-8d3a-4308-b4b2-efbc8d902e66',
  'created_time': '2021-08-04T12:28:00.000Z',
  'last_edited_time': '2021-08-05T09:54:00.000Z',
  'has_children': False,
  'type': 'heading_1