# 🏷 Managing Data with PySyft 0.9.1b
---

## 🖼 Scenario

You have collected a dataset that contains information about different individuals their lifespans and their occupations. Originally your group collected it because you wanted to use machine learning and text mining models to collect, predict, and cleanse different countrys' census data with a focus on age and gender. Using a 5-step method your team inferred birth and death years, binary gender, and occupation from citizen-submitted data across differing national statistic office programs and news sites.

In the end your dataset includes data on individuals from a variety of social groups, including but not limited to 107k females, 124 non-binary people, and 90k researchers, who are spread across more than 300 regions. You want to share it with other researchers because you feel that several fields of study could benefit from a large, structured, and accurate dataset about occupation and lifespan but you do not want to expose any individual in the dataset.

## 😎 Mission

Your mission for this exercise is to protect the sensitive aspects of the dataset (_described below_) while allowing an external research to answer pertinent research questions off of the data. This mission consists of two parts.

**Part One:**
- You will have to create mock data out of the dataset provided
- You will have to upload the dataset and corresponding mock data to the Datasite provided

**Part Two**
- The _*External Researcher_ will submit 3 code requests for you to review
- You will need to review each request and decide whether to "Approve" or "Deny" them

\*_For the purposes of this test the 'External Researcher' role will be played by our data scientist bot or by your moderator_

**Sensitive Properties**
We want to help the external researcher answer their questions while preventing information that is sensitive from being shared. For the purposes of this test we can assume that the true values of the following properties are sensitive and therefore, should not be shared with the external researcher...
- the "Birth year"
- the "Death year"
- the "Name"

_*Disclaimer: The dataset provided has been modified for the purposes of this test and is not an accurate reflection of the [source data](https://workshop-proceedings.icwsm.org/abstract?id=2022_82) collected in the ICWSM workshop_

### Helpful Resources
- [PySyft Documentation](https://docs.openmined.org/en/latest/index.html)
- [PySyft Repo](https://github.com/OpenMined/PySyft)

#####
---


# 🖥 Setup Test Environment

### Run in CoLab (opt)
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/OpenMined/design/blob/main/user_tests/manage_data_091b/Managing%20Data%20with%20PySyft091b_A.ipynb) 

### 1. Install Syft

Before you begin the test you will need to have PySyft installed. If you do not have PySyft installed you can run the cell below or reference this [Quick Install Guide](https://docs.openmined.org/en/latest/quick_install.html) to get started.

In [None]:
# The script below will install PySyft(beta) to your machine
#!pip install -U syft --pre

### 2. Deploy Test Env
Please run the cells below to deploy a local instance of the test environment.

In [None]:
# IMPORT DEPENDENCIES
import syft as sy
import pandas as pd

In [None]:
# DOWNLOAD TEST SCRIPT
!curl -O https://raw.githubusercontent.com/OpenMined/design/main/user_tests/manage_data_091b/manage_data_setup_a.py

In [None]:
# IMPORT TEST SCRIPT
import manage_data_setup_a as md_setup

In [None]:
# lAUNCH lOCAL SERVER
server = sy.orchestra.launch(
    name = "U.S. Stats",
    reset=True,
    port= 8080
)

In [None]:
# CREATE EXTERNAL RESEARCHER PROFILE
md_setup.create_user(port=8080)

---

# 🚩 Begin Mission!

### Part One
With the aim of uploading your dataset in a way that an external researcher can form their research question but not view any sensitive information from it; please begin **part one** of your mission below. The dataset you will be using is linked below. Assistance with uploading data and creating mock data can be found using [PySyft's Documentation Site](https://docs.openmined.org/en/latest/getting-started/part3-research-study.html).
- [**Test Dataset**](www.openmined.org).

In [None]:
# This is an optional curl line to download the dataset via the notebook
#!curl -O https://raw.githubusercontent.com/OpenMined/design/main/user_tests/manage_data_091b/assets/test_dataset_real_revised.csv

In [None]:
# Login to the Datasite as an "Admin"
admin_client = sy.login(email="info@openmined.org", password="changethis", port=8080)

In [None]:
# Begin mission to upload the dataset to your Datasite


In [None]:
# Add as many cells as you need ^_^

### End Part One 🙌
Great job completing Part One! Now that you have uploaded your dataset to your Datasite it is time to let our external researcher propose projects off of it. Please run the cell below to beging the second portion of the test. If you run into any issues please notify your moderator.

In [None]:
# Run External Researcher Code Submission Script
md_setup.code_submission(port= 8080)

---
### Part Two
Our external researcher has submitted a project and code to run against your data. You are now able to review the code submitted and make a decision on whether to approve or deny the requests. Please #comment your thought process as you review. If you get stuck you can use [PySyft's Documentation Site](https://docs.openmined.org/en/latest/getting-started/part4-review-code-request.html) for assistance.

In [None]:
# Login to the Datasite as an "Admin"
admin_client = sy.login(email="info@openmined.org", password="changethis", port=8080)

In [None]:
# Add as many cells as you need ^_^

### End Part Two 🙌
Great job! You have just acted out the role of **Data Owner** and have made decisions on what can and cannot be answered about your data. To finish out the test, please see the **Post-Test Response** section below. 

#####
---
## ✏ Post-Test Response

### 1. Post-Test Survey

Please **upload your notebook** and tell us about your experience in the [**→→ form here ←←**](https://forms.gle/TTpgEBu2xjh6qoqg6) to conclude the test.


#####
### 🛑 Shutdown Test Environment
Run the cell below to shutdown the local instance of the test environment.

In [None]:
# The following command will shutdown the local test server
server.land()