<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Notion - Update Database with LinkedIn Company Info

**Tags:** #notion #database #update #linkedin #company #info

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** This notebook updates a database filled with company name with info found on LinkedIn using google search, naas_drivers.notion and naas_drivers.linkedin.

**References:**
- [Notion Drivers](https://github.com/jupyter-naas/drivers/blob/main/naas_drivers/tools/notion.py)
- [LinkedIn Drivers](https://github.com/jupyter-naas/drivers/blob/main/naas_drivers/tools/linkedin.py)

## Input

### Import libraries

In [1]:
import naas
from naas_drivers import linkedin, notion
try:
    from googlesearch import search
except:
    !pip install google
    from googlesearch import search
import re
from datetime import datetime
import os
import requests

### Setup Variables
- `notion_token`: [Notion token](https://www.notion.so/api/v3/get-token)
- `linkedin_token`: [LinkedIn token](https://developer.linkedin.com/docs/oauth2)

In [2]:
# Inputs
# -> LinkedIn
LI_AT = naas.secret.get("LINKEDIN_LI_AT")
JSESSIONID = naas.secret.get("LINKEDIN_JSESSIONID")
# -> Notion
notion_token = naas.secret.get("NOTION_TOKEN") or "YOUR_TOKEN"
notion_database_key = "Name"
force_update = False

# Outputs
notion_database = "https://www.notion.so/naas-official/01080b7d915d4d2d80de73cbfc0674cb?v=f42a92de92974b57b54e8a787ff3ad29&pvs=4"

## Model

### Get Notion DB

In [3]:
def create_notion_db(notion_database, key, token):
    # Get database
    database_id = notion_database.split("/")[-1].split("?v=")[0]
    pages = notion.connect(token).database.query(database_id, query={})

    # Init
    df_output = pd.DataFrame()
    
    # Loop on page
    for page in pages:
        # Get page_id
        page_id = page.id
        
        # Create dataframe from page
        df = page.df()
        
        # Remove empty pages
        page_title = df.loc[df.Name == key, "Value"].values[0]
        if page_title == "":
            notion.connect(token).blocks.delete(page_id)
            print(f"Page '{page_id}' empty => removed from database")
        else:
            # Pivot rows to columns
            columns = df["Name"].unique().tolist()
            new_df = df.copy()
            new_df = new_df.drop("Type", axis=1)
            new_df = new_df.T
            for i, c in enumerate(new_df.columns):
                new_df = new_df.rename(columns={c: columns[i]})
            new_df = new_df.drop("Name").reset_index(drop=True)

            # Add page ID
            new_df["PAGE_ID"] = page_id

            # Concat dataframe
            df_output = pd.concat([df_output, new_df])
    return df_output

df_notion = create_notion_db(
    notion_database,
    notion_database_key,
    notion_token
)
print("📊 Notion DB:", len(df_notion))
df_notion.head(1)

📊 Notion DB: 76


Unnamed: 0,LinkedIn,Country,Tags,Annual Revenue (m$),Description,Followers,Staff Range,Website,Google Search,Industry,City,Revenue source,Name,PAGE_ID
0,,,,,,,,,False,,,,Zoho Mail,a01b41b8-5bb7-4bbd-8037-782a2e8606e6


## Output

### Get Company Info

In [4]:
def get_linkedin_url(company):
    # Init linkedinbio
    linkedinbio = None
    
    # Create query
    query = f"{company}+Linkedin"
    print("--> Google query:", query)
    
    # Search in Google
    for i in search(query, tld="com", num=10, stop=10, pause=2):
        pattern = "https:\/\/.+.linkedin.com\/company\/.([^?])+"
        result = re.search(pattern, i)

        # Return value if result is not None
        if result != None:
            linkedinbio = result.group(0).replace(" ", "")
            return linkedinbio
    return linkedinbio

for index, row in df_notion.iterrows():
    company_name = row["Name"]
    google_search = row["Google Search"]
    page_id = row["PAGE_ID"]
    page = notion.connect(notion_token).page.get(page_id)
    lk_company_url = row["LinkedIn"]    
    if str(lk_company_url) == "None" and str(google_search) == "False":
        # Finding URL
        print("🔍 Finding LinkedIn URL for:", company_name)
        lk_company_url = get_linkedin_url(company_name)
        print("--> Result found:", lk_company_url)
        # Update page in Notion
        if lk_company_url:
            page.link("LinkedIn", lk_company_url)
            page.checkbox("Google Search", True)
            page.update()
            
    if lk_company_url != "None" and lk_company_url is not None:
        if not lk_company_url[-1:].isnumeric() or force_update:
            print("➡️ Update Info for:", company_name)
            # Get LinkedIn Company Info
            df_company = linkedin.connect(LI_AT, JSESSIONID).company.get_info(lk_company_url)
            company_name = df_company.loc[0, "COMPANY_NAME"]
            company_id = df_company.loc[0, "COMPANY_ID"]
            company_url = f"https://www.linkedin.com/company/{company_id}"
            company_industry = df_company.loc[0, "INDUSTRY"]
            company_website = df_company.loc[0, "WEBSITE"]
            company_desc = df_company.loc[0, "DESCRIPTION"]
            company_country = df_company.loc[0, "COUNTRY"]
            company_city = df_company.loc[0, "CITY"]
            company_staff = df_company.loc[0, "STAFF_RANGE"]
            company_followers = df_company.loc[0, "FOLLOWER_COUNT"]
            company_logo_url = df_company.loc[0, "LOGO_URL"]

            # Save dataframe
            company_name_c = company_name.replace(' ', '_')
            csv_name = f"{datetime.now().strftime('%Y%m%d')}_LINKEDIN_COMPANY_{company_id}.csv"
            output_path = os.path.join(OUTPUTS_PATH, "l4_insights", "organizations", company_name_c)
            if not os.path.exists(output_path):
                os.mkdir(output_path)
            df_company.to_csv(os.path.join(output_path, csv_name), index=False)

            # Update Notion
            page.title("Name", company_name)
            page.link("LinkedIn", company_url)
            page.select("Industry", company_industry)
            page.link("Website", company_website)
            page.rich_text("Description", company_desc)
            page.select("Country", company_country)
            page.select("City", company_city)
            page.select("Staff Range", company_staff)
            page.number("Followers", int(company_followers))
            if company_logo_url != "None":
                notion.client.pages.update(
                    page_id=page.id, icon={"type": "external", "external": {"url": company_logo_url}}
                )
            page.update()

🔍 Finding LinkedIn URL for: Zoho Mail
--> Google query: Zoho Mail+Linkedin
--> Result found: https://in.linkedin.com/company/zoho-mail
➡️ Update Info for: Zoho Mail
