In [1]:
!pip install youtube-transcript-api

Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.2-py3-none-any.whl.metadata (15 kB)
Downloading youtube_transcript_api-0.6.2-py3-none-any.whl (24 kB)
Installing collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-0.6.2


In [2]:
!pip install google-generativeai



In [3]:
google_api = "AIzaSyAmp5-TvDuIZcIGD2eJrwc_vc2HisTs7zM"

In [4]:
import google.generativeai as genai

In [5]:
genai.configure(api_key=google_api)
model = genai.GenerativeModel("gemini-1.5-flash")

In [6]:
import re
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled

In [7]:
# List of supported languages with their ISO codes
supported_languages = {
    "English": "en",
    "Japanese": "ja",
    "Spanish": "es",
    "French": "fr",
    "German": "de",
    "Chinese (Simplified)": "zh-Hans",
    "Chinese (Traditional)": "zh-Hant",
    "Korean": "ko",
    "Russian": "ru",
    "Portuguese": "pt",
    "Italian": "it",
    "Dutch": "nl",
    "Arabic": "ar",
    "Hindi": "hi",
    "Swedish": "sv",
    "Norwegian": "no",
    "Danish": "da",
    "Finnish": "fi",
    "Greek": "el",
    "Polish": "pl",
}

In [8]:
# Regular expression pattern to extract the video ID
def extract_video_id(url):
    pattern = r"(?:https?://)?(?:www\.)?(?:youtube\.com|youtu\.be)/(?:watch\?v=|embed/|v/|.+/|)([\w-]{11})"
    match = re.search(pattern, url)
    return match.group(1) if match else None

In [9]:
# Function to fetch and translate transcript
def fetch_and_translate_transcript(video_id):
    transcript_paragraph = ""

    try:
        # Get available transcripts (manual and auto-generated)
        transcript_info = YouTubeTranscriptApi.list_transcripts(video_id)

        # Check if an English transcript is available
        for transcript in transcript_info:
            language_code = transcript.language_code

            if language_code == "en":
                # Directly fetch the English transcript if available
                entries = transcript.fetch()
                transcript_paragraph += " ".join([entry['text'] for entry in entries])
                break  # Stop after fetching the English transcript

            elif language_code in supported_languages.values():
                # Translate the transcript to English if it's in a supported language
                translated_transcript = transcript.translate('en').fetch()
                transcript_paragraph += " ".join([entry['text'] for entry in translated_transcript])
                break  # Stop after fetching the translated transcript

    except TranscriptsDisabled:
        print("Transcripts are disabled for this video.")
    except Exception as e:
        print("An error occurred:", e)

    return transcript_paragraph

In [10]:
# YouTube URL
url = "https://youtu.be/qV3yjIyj7Dc?si=fT2pWkNSecaoZwmL"

# Extract video ID from URL
video_id = extract_video_id(url)

# Fetch and print the transcript paragraph
if video_id:
    transcript_paragraph = fetch_and_translate_transcript(video_id)
    # print(transcript_paragraph)
else:
    print("Invalid YouTube URL.")


In [11]:
def summarize_text(transcript_paragraph):
    model= genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content([f"The text is transcript of a YouTube Video. Summarize the following text in detail: {transcript_paragraph}."])
    return response.text

summary = summarize_text(transcript_paragraph)
print(summary)

This transcript is a conversation with Alexander Dyakonov, a professor at Moscow State University specializing in machine learning, who is known for his participation in data science competitions like Kaggle.  

**Key Takeaways:**

* **Kaggle's Value:**
    * **Testing Ground:** Kaggle provides a real-world testing ground for machine learning methods, debunking myths and pushing the boundaries of what's possible.
    * **Experience Booster:** Participating in Kaggle competitions offers invaluable experience for aspiring data scientists, even if they are not directly applicable to real-world problems.
    * **Myth Buster:** It challenges the misconception that academic methods are always superior to real-world applications. 
    * **Order in Chaos:** It brings order to the field by providing a standardized platform for comparing different algorithms and approaches.
* **The Downside of Kaggle:**
    * **Addiction:**  Kaggle can become addictive, taking away time from other career or pers

In [12]:
def generate_faq(transcript_paragraph):
    model = genai.GenerativeModel("gemini-1.5-flash")

    response = model.generate_content([f"""
    The following text is a transcript of a YouTube Video: {transcript_paragraph}

    Generate 10 frequently asked questions (FAQs) related to these topics.
    Finally, extract information from the transcript that can be used to answer each FAQ.

    Please ensure that the generated FAQs and answers do not infringe on any copyrights.
    If you encounter any potentially copyrighted material, please skip it and focus on other parts of the transcript.
    """])
    return response.text

faq = generate_faq(transcript_paragraph)
print(faq)

## FAQs about Machine Learning and Kaggle:

**1. FAQ: What is Kaggle, and is it really useful for machine learning?**

**Answer:** Kaggle is a platform for data science competitions. While not directly solving real-world business problems, it offers valuable benefits. It provides a platform for testing machine learning methods in "combat conditions," debunking myths and verifying methods' effectiveness. Participants gain practical experience and motivation, and the platform has contributed to a more structured approach to evaluating method effectiveness. 

**2. FAQ: Are Kaggle competitions a good measure of someone's machine learning skills for a job?**

**Answer:** While Kaggle experience is beneficial, it's not a definitive indicator of job performance. Real-world tasks differ from competitions, demanding skills like reliability and application-specific knowledge. Companies should consider a combination of skills, including general knowledge, programming abilities, and the ability to

In [13]:
question = input("Enter your question: ")

In [14]:
def question_text(transcript_paragraph, question):
    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content([f"Please answer the following question based on the provided text: {transcript_paragraph}. Question: {question}"])
    response_text = response.text
    cleaned_text = response_text.replace('\n', '')
    return cleaned_text

answer = question_text(transcript_paragraph, question)
answer

'The main speaker in this video is **Alexander Gennadievich Dyakonov**. '