## Investigate Projects

**Author**: elisabettai </br>
**Last change date**: 6th June 2023

This notebook provides code snippets that answer useful questions about user projects.

Access to the projects table in the DB, the result (`project_table`) can be reused to answer different questions.

In [32]:
%%capture

# Install dependencies
import sys
!{sys.executable} -m pip install sqlalchemy
!{sys.executable} -m pip install psycopg2-binary
!{sys.executable} -m pip install asyncio
!{sys.executable} -m pip install aiopg
!{sys.executable} -m pip install typer
!{sys.executable} -m pip install pandas

# Input access variables
import sqlalchemy as db
import os, sys, getpass


PG_PASSWORD = os.environ.get('POSTGRES_PASSWORD')
PG_ENDPOINT=os.environ.get('POSTGRES_ENDPOINT')
PG_DB=os.environ.get('POSTGRES_DB')
PG_USER=os.environ.get('POSTGRES_USER')

# Get list of projects from the database
pg_engine_url = "postgresql://{user}:{password}@{host}:{port}/{database}".format(
        user=PG_USER,
        password=PG_PASSWORD,
        database=PG_DB,
        host=PG_ENDPOINT.split(":")[0],
        port=int(PG_ENDPOINT.split(":")[1]),
    )

engine = db.create_engine(pg_engine_url)
connection = engine.connect()
metadata = db.MetaData()
projects_table = db.Table('projects', metadata, autoload=True, autoload_with=engine)

### **Question 1**: How many Studies/which percentage of them is shared (and not only private to the owner)?

**Principle**: shared projects are the ones that have more than one entry in the `access_rights` column

In [30]:
query = db.select([projects_table.c.access_rights])
all_projects = engine.execute(query).fetchall()

num_shared = len([x for x in all_projects if len(x[0]) > 1])
num_projects = len(all_projects)
perc_shared = num_shared/num_projects*100

print(f"Answer: Out of {num_projects} Studies, {num_shared} are shared. This corresponds to {round(perc_shared, 1)}% of the Studies.")

Answer: Out of 24176 Studies, 281 are shared. This corresponds to 1.2% of the Studies.
