<hr>
<hr>
>> Scratch Notes

TODO:
- Clarify Syft vs PySyft terminology
- Fix broken links to sections in Table of Contents
- Add links to next notebook?
- Ask Irina/Andrew to review which links/projects to highlight
- Links for XChurch:
    - https://www.amcham.co.nz/Latest-News/12928098, 
    - https://www.christchurchcall.com/media-and-resources/news-and-updates/,
    - https://unstats.un.org/wiki/display/UGTTOPPT/14.+Twitter+and+OpenMined%3A+Enabling+Third-party+Audits+and+Research+Reproducibility+over+Unreleased+Digital+Assets, 
    - https://blogs.microsoft.com/on-the-issues/2022/09/20/christchurch-call-responsible-ai-online-extremism/
- Ask if we still want to link to GitHub ReadMe as installation instructions
    - Since Docs are version specific and ReadMe only points to most recent version, might be a bad idea unless we link to OpenMined/PySyft/0.8.x in an automated fashion, etc
    - However This would help centralize all instructions and create a ground source of truth
- Figure out how to setup Google Colab 1 click deploy
- Add link to GitHub or somewhere showing current OS support status?


IDEAS:
- Add notebook before this one as "Our Vision" or something of that sort?



Contents:
- Intro: PySyft as a mailbox for code
    - **Emphasize:** access 100-1000x more data across every scientific field
    - Depending on how they've setup their mailbox, you may get your results back immediately, or after a code review.
- Quickstart:
    - Installation steps 
        - Link to GitHub Readme?
        - Python, Virtualenvironments, etc?
    - Syft in 10 minutes Example?


Irina Proposed contents:

- How PySyft facilitates collaboration brief summary
- Quickstart -> link to colab env
- Install steps for all platforms separate even if repetitive
- Resources for installing Python3.9+ for older OS versions.
- Creating a venv using domain’s libs versions
- Connecting to a domain and using the Python API

<hr>
<hr>

<h1><center> Getting Started: Data Science with PySyft! </h1></center>
<b> PySyft (v0.8.2): Data Scientist Documentation Notebook 1 </b>

<img src="../_images/title_syft_light.png"></img>


Welcome!
PySyft is an <a href="https://github.com/openmined/pysyft">open source library </a> that lets you perform data science on data that is located on someone else's server.

Whether you're a data scientist, machine learning engineer, or looking to audit a large language model (LLM) or its training data, PySyft empowers you to perform your analyses on sensitive data without compromising privacy or security.

It has already been used in notable projects such as <a href="https://blog.openmined.org/announcing-our-partnership-with-twitter-to-advance-algorithmic-transparency/"> Twitter's work on algorithmic transparency </a>, the Christchurch Call to combat extremist content [TO DO: FIND LINKS] , as well as collaborations with the United Nations [FIND LINK] and so on.

This notebook is the first of many, and will cover the following:
- Introduction
    - PySyft as a mailbox for Code
    - Steps for using PySyft
- Quickstart: Setting up and Installation
    - PySyft with one click: Google Colab
    - Installation
        - Python
        - Virtual Environments
    - Connecting to a Domain Node
    
    
The following notebook will pick up from there, and will teach you how to inspect datasets on the domain node, and create code requests.

<hr>

# Introduction


## PySyft: Mailbox for Code
What _is_ PySyft?

The easiest way to understand PySyft is to think of it as a mailbox for code. PySyft is a framework that allows a data scientist to submit their analysis code to a data owner. 

Data Owners, who have large amounts of private data, as well as the infrastructure to work with it, can then assess the submitted code for privacy, security, IP and legal adherence, and if they approve it, can execute it within their own controlled environment. This ensures that sensitive data remains protected and isolated while generating valuable insights. After analysis, the Data Owner securely shares the results back with the data scientist through PySyft.

This can scale quite well:

- A single data scientist can submit the same piece of code to several data owners.
    - For example, think of each data owner representing all the data owned by a hopsital about a particular type of cancer. The data scientist could then train their ML model on cancer data that's stored in hospitals in various countries all over the world, while not requiring any of them to publicly disclose any information.
- A data scientist can also ask for the same piece of code to be run multiple times against a single data owner's data.
    - Imagine if the data scientist is trying to train a neural network on a data owner's data, and needs to train for multiple epochs on a single dataset.
    
## Working with PySyft

Working with PySyft is quite simple, and for a Data Scientist, usually involves the following steps:

- Find a Domain Node with data you want to work on
- Create an Account or Login to that Domain Node
- Inspect the Datasets on that domain node
- Create some analysis you'd like to run on the data
    - You can use any Python library you're already familiar with- NumPy, PyTorch, JAX, and so on!
- Submit your code to the domain node

... and after that, you'd either get the result of your analysis back, or you'd get feedback from the data owner about any RunTime Errors or other feedback they have for you. Simple as that!

<hr>

# Installation

Before you can start using PySyft, you have to install it. This is fairly straightforward, and we have cross platform support across Linux, MacOS, and Windows! [TODO: Add link to GitHub showing current support status?]

**Before we begin:** 
- This version of PySyft (0.8.2) requries Python 3.9-3.11. Steps on how to install Python can be found here at the Official Python documentation website [here.](https://wiki.python.org/moin/BeginnersGuide/Download)
- Before installing Syft, it would be helpful to have a VirtualEnvironment in order to isolate the Syft library and its dependencies from other parts of your system. A helpful primer on Virtual Environments can be found [here](https://realpython.com/python-virtual-environments-a-primer/).


Once you have that out of the way, Syft can be installed in one line using `pip`. Simply run this on your Terminal:

To install a specific version, simply modify this as follows:

You can then import syft freely using:

In [1]:
import syft

For simplicity, we often `import syft as sy`. 
Let's check our syft version:

In [2]:
syft.__version__

'0.8.2-beta.47'

At the moment, Syft does not have backwards compatibility with older versions (0.8.0 isn't backwards compatible with 0.7.0, for example.)

<hr>

# Connecting to a Domain Node

Once you have PySyft installed, the next step is to log in to a **domain node**. Domain nodes are deployed by data owners who want to allow data scientists to access their data for analyses. They are essential components of PySyft, allowing you to interact with remote datasets in a secure manner.

So let's say a domain node has been launched via the following commands:

In [3]:
node = syft.orchestra.launch(name="private-datasets-cancer", port=8062)

Starting private-datasets-cancer server on 0.0.0.0:8062
Waiting for server to start.. Done.


In [4]:
data_owner_client = node.login(email="info@openmined.org", password="changethis")

Logged into <private-datasets-cancer: High-side Domain> as GUEST
Logged into <private-datasets-cancer: High side Domain> as <info@openmined.org>


At this point, the Data Owner has launched their domain node. Let's say they create an account for you with the following credentials:

In [5]:
data_owner_client.register(
    name="Jane Doe", 
    email="jane@caltech.edu",
    password="abc123",
    password_verify="abc123"
)

For you to login to the domain node is then as straightforward as entering your email and password:

In [6]:
guest_client = node.client.login(email="jane@caltech.edu", password="abc123")

Logged into <private-datasets-cancer: High-side Domain> as GUEST
Logged into <private-datasets-cancer: High side Domain> as <jane@caltech.edu>


... And we're in!

<hr>

That concludes this first notebook. In our next one, we'll see how to inspect and use the datasets on this domain node.

See you there!