# <span style="color:red">I. Data Scrapping From YouTube</span>

## 1.Why Scrapping YouTube

### <span style="color:green">YouTube API </span> vs <span style="color:green">YouTube scraping</span>

**YouTube Data API** is the official way to get data from the platform, including information about videos, playlists, and content creators. However, there are at least three good reasons why **scraping YouTube** is better than relying solely on its API:

-**Flexibility and Customization**: With a YouTube spider, you can tailor the code to select only the data you need. This level of customization helps you collect the exact information for your specific use case. In contrast, the API only gives you access to predefined data.
-**Access to unofficial data**: The API provides access to specific sets of data selected by YouTube. This means that some data you currently rely on May no longer be available in the future. Web scraping allows you instead to obtain any additional information available on the YouTube website, even if not exposed through the API.
-**No limitation**: YouTube APIs are subject to rate limiting. This restriction determine the frequency and volume of requests that you can make in a given time frame. By interacting directly with the platform, you can circumvent any limitation.


In our Project we need the all comments and their authors, so we have no filters for the comments we will select it all, so we will use YouTube Api

## 2. Scraping YouTube With GoogleApiClient

**GoogleApiClient**: The google-api-python-client library is a Python client library for accessing Google's APIs. It provides an easy-to-use interface for interacting with various Google services programmatically, such as YouTube, Google Drive, Gmail, Google Calendar, Google Sheets, and many others.



## 3. Library Importation

In [None]:
!pip install google-api-python-client

## <span style='color:blue'>How To Get The API KEY From Google Cloud Console</span>
![# API_KEY](images/11.png)
![# API_KEY-](images/12.png)

In [5]:
import googleapiclient.discovery
import pandas as pd

# Set up YouTube API service
api_service_name = "youtube"
api_version = "v3"
API_KEY = "AIzaSyAQtdqS9bIEj0_-Uk3eWTLLxczpUqIxYqE"

youtube = googleapiclient.discovery.build(api_service_name, api_version, developerKey=API_KEY)

# Function to retrieve all comments from a video
def get_all_comments(video_id):
    video_info = youtube.videos().list(part="snippet",id=video_id).execute()
    comments = []
    video_title = video_info['items'][0]['snippet']['title']
    channel_title = video_info['items'][0]['snippet']['channelTitle']

    nextPageToken = None


    while True:
        request = youtube.commentThreads().list(
            part="snippet",
            videoId= video_id,
            maxResults=100,  # Max results per page (default is 100, maximum is 100)
            pageToken=nextPageToken
        )
        response = request.execute()


        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            comments.append({
                'Video_Title': video_title,
                'Channel_Title': channel_title,
                'Author_Name': comment['authorDisplayName'],
                'Published_At': comment['publishedAt'],
                'Like_Count': comment.get('likeCount', 0),
                'Comment': comment['textDisplay']
            })
        nextPageToken = response.get('nextPageToken')
        if not nextPageToken:
            break  # No more pages

    return comments

video_ids = ['Li9PVsGp1wo','IJHPpTYtIqk','JOsHoN_m0ZE','s1xbQVNGSPQ','Qr8eZSVaw10']
all_comments =[]

for i in range(len(video_ids)):
    all_comments.append(get_all_comments(video_ids[i]))

df = pd.DataFrame(all_comments)
# Save DataFrame to CSV
df.to_csv('Comments_Data.csv',encoding='utf-8-sig', index=False)

In [6]:
df.size

769000

In [8]:
all_comments[0]

[{'Video_Title': 'Psyco M - Plume',
  'Channel_Title': 'Psyco-M',
  'Author_Name': '@user-im9oh5ey9z',
  'Published_At': '2024-02-29T11:45:37Z',
  'Like_Count': 1,
  'Comment': '2024  وبش نقعد ديما نسمعك'},
 {'Video_Title': 'Psyco M - Plume',
  'Channel_Title': 'Psyco-M',
  'Author_Name': '@NOROTVDZ',
  'Published_At': '2024-02-27T02:03:34Z',
  'Like_Count': 1,
  'Comment': '🤍🇩🇿🤍🇩🇿🤍🇩🇿🤍🇩🇿🤍🇩🇿🤍🇩🇿'},
 {'Video_Title': 'Psyco M - Plume',
  'Channel_Title': 'Psyco-M',
  'Author_Name': '@logic7575',
  'Published_At': '2024-02-26T09:46:23Z',
  'Like_Count': 1,
  'Comment': 'Meilleur chanson de tout les temps ! 😊'},
 {'Video_Title': 'Psyco M - Plume',
  'Channel_Title': 'Psyco-M',
  'Author_Name': '@user-tw6gy7em6r',
  'Published_At': '2024-02-25T21:59:48Z',
  'Like_Count': 0,
  'Comment': 'مواقع التواصل الاجتماعي مش جوي لكل عملت شان يتيوب باش نكونسومي هل سلعة ❤2024/2'},
 {'Video_Title': 'Psyco M - Plume',
  'Channel_Title': 'Psyco-M',
  'Author_Name': '@user-kp4jf7ne1i',
  'Published_At': '2024

In [1]:
# Convert to DataFrame and save to CSV
df.to_csv('comments.csv',encoding='utf-8-sig')

NameError: name 'df' is not defined

In [None]:
# the most liked comment
df_sorted = df.sort_values(by='likeCount',ascending=False)
most_liked = df_sorted.iloc[0]

In [None]:
most_liked