In [1]:
!python --version

Python 3.9.1


In [2]:
import os
import openai

- Load the `.txt` file that contains your OpenAI API key.

In [3]:
with open('Data/Input/api-key.txt', 'r') as file:
    api_key = file.read()

os.environ["OPENAI_API_KEY"] = api_key
openai.api_key = os.getenv("OPENAI_API_KEY")

<img src="Data/UP Data Science Society Logo 2.png" width=700>

# DSSoc Mentorship: Data Ethics

One of our Mentorship Sessions at the UP Data Science Society covered [Data Ethics](https://www.facebook.com/100079165176241/videos/643438693861938), with Sir Benedict Olgado as the resource speaker:

<img src="Data/Input/DSSoc Mentorship/title_card.jpg" alt="Mentorship Card"  width=500>

- Data had become the most valuable resource in the world and was being used to generate insights and innovations that impacted industries and communities: At the time of the event, data had already transformed the way organizations operate and make decisions. The insights generated from data had the potential to revolutionize industries and improve the lives of people. With the massive amounts of data being generated every day, it had become crucial to understand the importance of data ethics.

- The proper use of data was emphasized then more than ever: As data grew in importance and power, there was a growing concern over its misuse. Ethical considerations had become increasingly important in data science and analytics. It was important to use data in a way that was responsible, transparent, and respects privacy and human rights.

- Sir Benedict Olgado, a PhD student at the University of California, Irvine, talked about the ethical use of data: Sir Olgado's expertise and knowledge in the field of data ethics made the talk a valuable opportunity for individuals and organizations to learn more about the ethical considerations that should be taken when handling data. It was important to hear from experts who can provide valuable insights into the responsible use of data.

- The talk was held in partnership with Diliman Learning Resource Center and was livestreamed on Facebook on Oct 6, 10-11 AM: The partnership between UP Data Science Society and Diliman Learning Resource Center provided a platform for individuals and organizations to learn more about data ethics. The livestreaming of the event on Facebook made it accessible to a wider audience, making it possible for more people to learn about the responsible use of data.

### Preliminaries

- Import `OpenAI_NoteTaker`.

In [4]:
from OpenAI_NoteTaker import OpenAI_NoteTaker

- Prepare role

In [5]:
role_txt = "You are a detail-oriented STEM student from the Philippines who wants to pursue a career as a data scientist who also specializes in science communication, which allows you to easily transcribe text to pure English."
print(role_txt)

You are a detail-oriented STEM student from the Philippines who wants to pursue a career as a data scientist who also specializes in science communication, which allows you to easily transcribe text to pure English.


# I. Discussion Proper

## A. Part 1 of Talk

- Summarize into 5 points.

In [6]:
%%time
Talk1a_NoteTaker = OpenAI_NoteTaker(input_dir='Data/Input/DSSoc Mentorship/Mentorship_Vid_Pt1a.mp4')

Talk1a_NoteTaker.take_notes(system_prompt=role_txt, 
                            n_items=5, 
                            convert2mp3=True,
                            export_mp3_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1a.mp3',
                            show_notes=True)

Output .mp3 file saved to Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1a.mp3
Conversion to mp3 successful.
Input file size: 1.7e+01 MB.
Duration: 1.1e+03 s
Input transcription tokens: 2702

NoteTaker's Summary in 5 points: 

1. The speaker views data both as a document and technology, and their background in information science and archiving informs this perspective. 
2. The speaker's relationship with data is tied to their commitment to human rights, and they work in developing information infrastructures for human rights organizations. 
3. Data-driven practice is becoming more ubiquitous and presents both promises of innovation and progress as well as threats and risks to social norms and activities. 
4. Pursuing data ethics requires acknowledging that data is never raw, never neutral, and always has a context that is related to the ecosystem it exists in. 
5. Ethical data practices should focus on improving people's lives and knowing when not to use data, as well as designing with

- Save raw transcription and summarized notes.
- View total pricing.

In [7]:
Talk1a_NoteTaker.save_notes(export_transcription_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1a [Transcribed]', 
                            export_summary_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1a [Notes]')

Talk1a_NoteTaker.get_total_job_price()

Transcript saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1a [Transcribed].txt
Summarized note saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1a [Notes].txt

Job Price Breakdown: 

transcription_price: 0.11220 USD
summarization_price: 0.00585 USD
total_job_price: 0.11805 USD


## B. Part 2 of Talk

- Summarize into 5 points.

In [8]:
%%time
Talk1b_NoteTaker = OpenAI_NoteTaker(input_dir='Data/Input/DSSoc Mentorship/Mentorship_Vid_Pt1b.mp4')

Talk1b_NoteTaker.take_notes(system_prompt=role_txt, 
                            n_items=5, 
                            convert2mp3=True,
                            export_mp3_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1b.mp3',
                            show_notes=True)

Output .mp3 file saved to Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1b.mp3
Conversion to mp3 successful.
Input file size: 1.8e+01 MB.
Duration: 1.2e+03 s
Input transcription tokens: 3018

NoteTaker's Summary in 5 points: 

1. Data ethics and the impact of datafication are often discussed in the context of the global north, but those examples of harm should inform developing nations as they mature their own datafication practices. 
2. The data science maturity of the Philippines is hindered by the poor management of records and documents which are necessary for accurate data. 
3. The video discusses the ethical approach of user-centered design when developing databases and algorithms, where the focus is on identifying user needs rather than designing for technology. 
4. Feminist approaches focus on centering care, responsibility, and minimizing harm through acts of refusal and clear commitments. 
5. Pursuing ethical data science requires interdisciplinary learning, constant engageme

- Save raw transcription and summarized notes.
- View total pricing.

In [9]:
Talk1b_NoteTaker.save_notes(export_transcription_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1b [Transcribed]', 
                            export_summary_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1b [Notes]')

Talk1b_NoteTaker.get_total_job_price()

Transcript saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1b [Transcribed].txt
Summarized note saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt1b [Notes].txt

Job Price Breakdown: 

transcription_price: 0.11710 USD
summarization_price: 0.00647 USD
total_job_price: 0.12357 USD


## C. Q&A

- Summarize in 6 points.

In [10]:
%%time
QnA_NoteTaker = OpenAI_NoteTaker(input_dir='Data/Input/DSSoc Mentorship/Mentorship_Vid_Pt2.mp4')

QnA_NoteTaker.take_notes(system_prompt=role_txt, 
                         n_items=6, 
                         convert2mp3=True,
                         export_mp3_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2.mp3',
                         show_notes=True)

Output .mp3 file saved to Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2.mp3
Conversion to mp3 successful.
Input file size: 2.3e+01 MB.
Duration: 1.5e+03 s
Input transcription tokens: 3894

NoteTaker's Summary in 6 points: 

1. The speaker highlights the importance of ethical considerations in data science.
2. The speaker emphasizes the need for consent and privacy when collecting and analyzing data.
3. The ethical issues surrounding web scraping are discussed, and the importance of analyzing the purpose and content of web-scraped information is emphasized.
4. The speaker recommends the text "Raw Data is an Oxymoron" and the work of Luciano Floridi for those interested in learning more about data ethics.
5. The importance of developing domain-specific knowledge and interdisciplinary skills is stressed.
6. The speaker recommends learning the programming language R.
CPU times: total: 6.48 s
Wall time: 1min 22s


- Save raw transcription and summarized notes.
- View total pricing.

In [11]:
QnA_NoteTaker.save_notes(export_transcription_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Transcribed]', 
                         export_summary_dir='Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Notes]')

QnA_NoteTaker.get_total_job_price()

Transcript saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Transcribed].txt
Summarized note saved at: Data/Output/DSSoc Mentorship/Mentorship_Vid_Pt2 [Notes].txt

Job Price Breakdown: 

transcription_price: 0.15260 USD
summarization_price: 0.00816 USD
total_job_price: 0.16076 USD


# II. Get 'grand total' job price

- Prepare `NoteTaker_List`, a list of `NoteTaker` objects.
- Get `.complete_job_price_dict` parameter from all items in `NoteTaker_List`.
- Construct `dict_list` out of all these dicts.

In [12]:
NoteTaker_List = [Talk1a_NoteTaker, 
                  Talk1b_NoteTaker,
                  QnA_NoteTaker]

dict_list = [NoteTaker.complete_job_price_dict for NoteTaker in NoteTaker_List]

- Use `collections` library to import `Counter` module in order to perform "key-based addition" across all dicts.

In [13]:
from collections import Counter

- Get grand total.

In [14]:
grand_total_dict = Counter()
for d in dict_list:
    grand_total_dict.update(d)

grand_total_USD_dict = {job_type: f"{value:.5f} USD" for job_type, value in grand_total_dict.items()}

- Pretty print.

In [15]:
print('NoteTaking Grand Total Job Price Breakdown: \n')
for job_type,value_USD in grand_total_USD_dict.items(): 
    print(f"{job_type}: {value_USD}") 

NoteTaking Grand Total Job Price Breakdown: 

transcription_price: 0.38191 USD
summarization_price: 0.02048 USD
total_job_price: 0.40239 USD
