<a href="https://colab.research.google.com/github/SaraIrfa/AWS-SageMakerStudio/blob/main/AmazonSageMaker_Project_Sara_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Installing libraries for various functionalities

# python-dotenv: Allows you to manage environment variables in a .env file.
!pip install python-dotenv

# openai: Provides access to the OpenAI GPT-3 language model and API.
!pip install openai

# youtube_dl: A tool for downloading videos and audios from YouTube.
!pip install youtube_dl

# youtube_transcript_api: Allows you to fetch YouTube video transcripts.
!pip install youtube_transcript_api

# torchaudio: A PyTorch-based library for audio processing and analysis.
!pip install torchaudio

# sentencepiece: A library for text segmentation and tokenization, useful for NLP tasks.
!pip install sentencepiece

# sacremoses: A tokenization and detokenization library for natural language processing.
!pip install sacremoses

# transformers: Provides pre-trained models and utilities for NLP tasks using the Hugging Face Transformers library.
!pip install transformers



In [None]:
import re  # Regular expression support for text manipulation
from youtube_transcript_api import YouTubeTranscriptApi  # Fetch YouTube video transcripts
import torch  # PyTorch for machine learning capabilities
import torchaudio  # Audio processing using PyTorch's torchaudio library
import openai  # Access OpenAI's APIs and language models
import textwrap  # Text formatting and paragraph wrapping
from transformers import pipeline  # Utilize pre-trained NLP models via Transformers library

In [None]:
######################################
# Specify the YouTube video URL
######################################
youtube_url = "https://www.youtube.com/watch?v=IRyC6QyBY58"



######################################
# Extract the video ID using regular expressions
######################################
# Search for the pattern "v=..." in the YouTube URL
match = re.search(r"v=([A-Za-z0-9_-]+)", youtube_url)
if match:
    # If a match is found, extract the video ID using group(1)
    video_id = match.group(1)
else:
    # If no match is found, raise an error indicating an invalid URL
    raise ValueError("Invalid YouTube URL")



######################################
# Get the transcript from YouTube
######################################
# Use the YouTubeTranscriptApi to fetch the transcript of the video
transcript = YouTubeTranscriptApi.get_transcript(video_id)




######################################
# Concatenate the transcript into a single string
######################################
# Initialize an empty string to store the concatenated transcript
transcript_text = ""
# Iterate through each segment (dictionary) in the transcript
for segment in transcript:
    # Append the text from the current segment to the transcript_text
    transcript_text += segment["text"] + " "




######################################
# Print the concatenated transcript text
######################################
# Display the complete concatenated transcript on the console
print(transcript_text)

- Do you want to remake
the way we make things? If your answer is yes, you
are at the right place. Welcome everyone. My name is Muhammad Sajid. I'm a Solutions Architect at AWS. With me, I have Marcus Ulmefors, Director Data and Machine
Learning Platforms at Northvolt. He's here to talk about how Europe's first homegrown
gigafactory, Northvolt, is producing sustainable back batteries in their connected factory. You will meet Marcus very soon. Both me and Marcus are
from Stockholm, Sweden. All right, I have a habit of
keeping my browser tabs open. There is a special term for
this called "Tab-tsundoku." Tsundoku is a Japanese word for impulsively acquiring
and piling up books, without ever intending to read them. So raise your hands if your
browser looks like this on a daily basis. Welcome to the club. (chuckles) But we can't keep
opening new tabs forever. Anything we can't keep doing forever is obviously not sustainable. So you're not here to listen
to my unsustainable habits. You are h

In [None]:
# Acquire YouTube's full transcript for insights.
# Utilize open-source NLP models via Transformers library.
# Transformers by Hugging Face: Leading NLP platform.
# Pretrained models cut compute costs and save time.
# Translate transcript using pretrained model.
# Hugging Face offers 2,500+ translation options.
# Experiment by adjusting "model_checkpoint" variable.



# Import the required function 'pipeline' from the Transformers library
from transformers import pipeline

# Replace this with your own checkpoint
# Specify the pretrained translation model you want to use
model_checkpoint = "Helsinki-NLP/opus-mt-en-es"

# Create a translation pipeline using the specified model
translator = pipeline("translation", model=model_checkpoint)

# Define the maximum sequence length
# This is the maximum length each segment can have
max_length = 512

# Split the input text into smaller segments
# Break the long transcript into smaller pieces to fit the model's input size
segments = [transcript_text[i:i+max_length] for i in range(0, len(transcript_text), max_length)]

# Initialize a variable to store the translated text
translated_text = ""

# Iterate through each segment and translate it
# The following loop processes each segment, translates it, and adds the translation to 'translated_text'
for segment in segments:
    # Use the translation pipeline to translate the current segment
    result = translator(segment)
    # Append the translated text from the current segment to 'translated_text'
    translated_text += result[0]['translation_text']

# Print the final translated text
# Display the translated transcript
print(translated_text)

Si su respuesta es sí, usted está en el lugar correcto. Bienvenido a todos. Mi nombre es Muhammad Sajid. Soy un Arquitecto de Soluciones en AWS. Conmigo, tengo Marcus Ulmefors, Director de Plataformas de Aprendizaje de Datos y Máquinas en Northvolt. Él está aquí para hablar de cómo la primera gigafactoría casera de Europa, Northvolt, está produciendo baterías de espalda sostenibles en su fábrica conectada.Un hábito de mantener abiertas las pestañas de mi navegador. Hay un término especial para esto llamado "Tab-tsundoku". Tsundoku es una palabra japonesa para adquirir y acumular libros impulsivamente, sin tener la intención de leerlos. Así que levanta las manos si tu navegador se ve así a diario. Bienvenido al club. (risas) Pero no podemos seguir abriendo nuevas pestañas para siempre. Cualquier cosa que no podamos seguir haciendo para siempre es obviamente no sostenible. Así que no estás aquí para escuchar mis hábitos insostenibles.0 sesión de ruptura, donde aprenderá cómo los clientes

In [None]:
# Import the necessary libraries
from transformers import pipeline, AutoTokenizer

# Instantiate the tokenizer and the summarization pipeline using a pre-trained model
tokenizer = AutoTokenizer.from_pretrained('stevhliu/my_awesome_billsum_model')
summarizer = pipeline("summarization", model='stevhliu/my_awesome_billsum_model', tokenizer=tokenizer)

# Define the chunk size in number of words
chunk_size = 200 # you may need to adjust this value depending on the average length of your words

# Split the transcript text into individual words
words = transcript_text.split()

# Split the transcript text into chunks of the defined size
chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

# Initialize an empty list to store the summaries
summaries = []

# Loop through each chunk and summarize it
for chunk in chunks:
    # Generate a summary for the current chunk
    summary = summarizer(chunk, max_length=100, min_length=30, do_sample=False)

    # Extract the generated summary text
    summary_text = summary[0]['summary_text']

    # Add the summary to the list of summaries
    summaries.append(summary_text)

# Join the individual summaries back together into a single final summary
final_summary = ' '.join(summaries)

# Print the final summarized text
print(final_summary)

I'm a Solutions Architect at AWS. With me, I have Marcus Ulmefors, Director Data and Machine Learning Platforms at Northvolt. You will meet Marcus very soon. Both me and Marcus are from Stockholm, Sweden. There is a special term for this called "Tab-tsundoku" I expected every one of you to raise your hands. You know why? Just like security, sustainability is everyone's responsibility. Let's look at the agenda. Today, you will learn how manufacturers can use data to become more sustainable. What are some of the modern technologies you can use to optimize factory operations? I will also show you a short demo of AWS IoT TwinMaker. greenhouse gases trap heat in earth's atmosphere that in return, makes the planet much hotter. There are many different greenhouse gases, including carbon dioxide, methane, nitrous oxide, and many more. The collective term "greenhouse gases" goes by the more scientific name called carbon dioxide equivalent. the 10 gigaton is 10 stacks of elephants stretching fro

In [None]:
import textwrap
import openai

# Function to split the text into chunks of a specified size
def split_text_into_chunks(text, max_chunk_size):
    return textwrap.wrap(text, max_chunk_size)

# Set your OpenAI API key here
openai.api_key = ""

# Define the maximum chunk size for splitting the transcript
max_chunk_size = 4000

# Split the transcript text into manageable chunks
transcript_chunks = split_text_into_chunks(transcript_text, max_chunk_size)

# Initialize an empty string to store the generated summaries
summaries = ""

# Loop through each chunk of the transcript and generate summaries
for chunk in transcript_chunks:
    # Create a chat completion request using GPT-3.5 Turbo model
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-16k",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": f"{chunk}\n\nCreate short concise summary"}
        ],
        max_tokens=250,   # Set the maximum length of the response
        temperature=0.5   # Control the randomness of the output
    )

    # Extract the generated summary from the response and add it to the summaries string
    summaries += response['choices'][0]['message']['content'].strip() + " "

# Print the final combined summary
print("Summary:")
print(summaries)

Summary:
In this Level 300 breakout session, Muhammad Sajid, a Solutions Architect at AWS, discusses how manufacturers can leverage AWS to become more sustainable. He highlights the importance of sustainability and the impact of greenhouse gases on global warming. Sajid also introduces Marcus Ulmefors, Director Data and Machine Learning Platforms at Northvolt, who shares their experience of using AWS to operate a connected factory for sustainable battery production. The session includes topics such as using data to optimize factory operations, modern technologies for sustainability, and a demo of AWS IoT TwinMaker. Sajid concludes by providing call to actions and learning resources for attendees. The manufacturing industry is a significant contributor to greenhouse gas emissions and resource consumption. However, with the advent of the fourth industrial revolution and the integration of data and cloud technology, there is an opportunity to improve sustainability and profitability in ma

In [None]:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a technical instructor."},
{"role": "user", "content": transcript_text},
{"role": "user", "content": "Generate steps to follow from text."},
]
)

# The assistant's reply
guide= response['choices'][0]['message']['content']

print("Steps:")
print(guide)

Steps:
1. Understand the global impact of greenhouse gases and the need for sustainability in the manufacturing sector.
2. Learn about the ISA95 framework and its role in integrating enterprise and control systems.
3. Explore modern technologies that can help manufacturers become more sustainable, such as IoT, AI/ML, and cloud computing.
4. Discover how AWS IoT and ad services can be used to collect and analyze data from factory operations.
5. Explore the concept of digital twins and how they can be used to improve manufacturing processes and operations.
6. Learn about Northvolt, Europe's first homegrown gigafactory, and their approach to sustainable battery production.
7. Understand the technical architecture of Northvolt's connected factory, including monitoring, controls, and data platforms.
8. Explore the scalability of the Northvolt digital blueprint and how it can be applied to multiple factories.
9. Take advantage of the resources and sessions provided to further explore sustain

In [None]:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates questions."},
{"role": "user", "content": transcript_text},
{"role": "user", "content": "Generate 50 quiz questions based on the text with multiple choices."},
]
)

# The assistant's reply
quiz_questions = response['choices'][0]['message']['content']

print("Quiz Questions:")
print(quiz_questions)

Quiz Questions:
1. What is the average temperature of the Earth?
   a) 10 degrees Celsius
   b) 15 degrees Celsius
   c) 20 degrees Celsius
   d) 25 degrees Celsius

2. How many gigatons of greenhouse gases are contributed by the manufacturing sector each year?
   a) 5 gigatons
   b) 10 gigatons
   c) 15 gigatons
   d) 20 gigatons

3. What is the collective term for greenhouse gases?
   a) Carbon dioxide equivalent
   b) Nitrogen oxide equivalent
   c) Methane equivalence
   d) Oxygen equivalence

4. According to the IEA, how much CO2 did the industrial activity emit in 2021?
   a) 5 gigatons
   b) 7.5 gigatons
   c) 9.4 gigatons
   d) 12 gigatons

5. What is the purpose of the fourth industrial revolution?
   a) To bring physical and digital systems together
   b) To automate manufacturing processes
   c) To increase global productivity
   d) To reduce greenhouse gas emissions

6. What is the ISA95 framework used for?
   a) Integrating enterprise and control systems
   b) Monitoring f