네, 다음은 Yahoo Finance에서 동영상 파일을 다운로드하기 위해 `requests`와 `BeautifulSoup`을 사용하여 필요한 쿠키를 파싱하고, `ffmpeg`를 통해 동영상을 다운로드하는 전체 과정을 정리한 것입니다.

### 1. **네트워크 탭에서 `m3u8` URL 찾기**
   - **브라우저 개발자 도구 열기**: `F12` 또는 `Ctrl + Shift + I` (Windows/Linux) 또는 `Cmd + Option + I` (Mac).
   - **네트워크 탭 선택**: 상단의 "Network" 탭을 선택합니다.
   - **동영상 재생**: 동영상을 재생하거나 페이지를 새로고침하여 네트워크 요청을 확인합니다.
   - **`m3u8` 파일 찾기**: 목록에서 `.m3u8`로 끝나는 URL을 찾아 복사합니다. 이 URL은 스트리밍 동영상의 마스터 플레이리스트를 가리킵니다.

   > **참고**: 네트워크 탭에서 URL을 찾을 수 있다면, 이 URL은 `requests`를 사용하여도 찾을 수 있습니다. `requests`로 찾으려면 웹페이지 HTML을 파싱하거나, 개발자 도구를 통해 발견된 URL을 직접 사용해야 합니다.

### 2. **`requests`와 `BeautifulSoup`를 사용하여 쿠키 파싱하기**
   - **코드 작성**: 아래와 같이 `requests`와 `BeautifulSoup`을 사용해 쿠키 값을 추출합니다.

   ```python
   import requests
   from bs4 import BeautifulSoup

   # 세션 객체 생성
   session = requests.Session()

   # 웹페이지 요청 (URL은 원하는 웹페이지의 주소로 변경)
   response = session.get('https://finance.yahoo.com/video/three-reasons-why-market-due-143838206.html')

   # 응답에서 쿠키 가져오기
   cookies = session.cookies.get_dict()

   # 쿠키 출력 (또는 필요한 쿠키 값만 추출)
   print("Cookies:", cookies)
   ```

   - **세션 사용**: `requests.Session()`을 사용하여 세션을 관리하고 쿠키를 유지합니다.
   - **쿠키 파싱**: `session.cookies.get_dict()`를 사용하여 세션에 저장된 쿠키를 딕셔너리 형태로 얻습니다.

### 3. **`ffmpeg`를 사용하여 동영상 다운로드**
   - **쿠키 값을 `ffmpeg` 명령어에 포함**: 파싱한 쿠키 값을 `ffmpeg` 명령어에 사용하여 인증된 요청을 보냅니다.

   ```bash
   ffmpeg -headers "Cookie: session_id=abc123; other_cookie=value;" -i "https://your.m3u8.url/here" -c copy output.mp4
   ```

   - **명령어 실행**: 이 명령어를 사용하여 동영상 스트림을 다운로드합니다.

### 4. **결과 확인**
   - **동영상 파일 확인**: 다운로드가 완료되면, `output.mp4` 파일을 열어 동영상이 제대로 저장되었는지 확인합니다.

이 과정을 통해, 웹사이트에서 보호된 동영상을 다운로드하는 데 필요한 쿠키를 추출하고 `ffmpeg`로 다운로드하는 방법을 구현할 수 있습니다.

In [10]:
# 같은 구조의 모든 기사 제목 가져오기
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://news.ycombinator.com/news')

# BeautifulSoup으로 HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

# 모든 기사 제목 찾기
titles = soup.find_all('span', class_='titleline')

# 제목 텍스트 출력
for title in titles:
    print(title.get_text(strip=True))

Show HN: InstantDB – A Modern Firebase(github.com/instantdb)
Notris: A Tetris clone for the PlayStation 1(github.com/jbreckmckye)
Python's Preprocessor(pydong.org)
Aerc: A well-crafted TUI for email(sergeantbiggs.net)
Ordinals aren't much worse than Quaternions(philipzucker.com)
Batteryless OP-1(shred.zone)
Continuous reinvention: A brief history of block storage at AWS(allthingsdistributed.com)
Show HN: A Ghidra extension for exporting parts of a program as object files(github.com/boricj)
When Serial Isn't RS-232, Geocaching with the Garmin GPS 95(terinstock.com)
What is an SBAT and why does everyone suddenly care(mjg59.dreamwidth.org)
Rare yet Impactful – Orthographic Projection in Films and Animations(cined.com)
Launch HN: Arva AI (YC S24) – AI agents for instant global KYB onboarding
Launch HN: AnswerGrid (YC S24) – Web research tool for lead generation
DRAKON(wikipedia.org)
Ethernet History Deepdive – Why Do We Have Different Frame Types?(lostintransit.se)
What If Data Is a Bad Id

In [21]:
# 인덱싱을 이용해 특정 기사 제목 가져오기 / 텍스트만 추출
# 두 번째 기사 제목 가져오기 (인덱스 1, 첫 번째는 인덱스 0)
second_title = titles[1].get_text(strip=True)
print(second_title)

second_link = titles[1].find('a')['href']
print("링크:", second_link)

Notris: A Tetris clone for the PlayStation 1(github.com/jbreckmckye)
링크: https://github.com/jbreckmckye/notris


In [14]:
# 태그와 기사 제목 함께 가져오기
second_title = titles[1]
print(second_title)

<span class="titleline"><a href="https://github.com/jbreckmckye/notris">Notris: A Tetris clone for the PlayStation 1</a><span class="sitebit comhead"> (<a href="from?site=github.com/jbreckmckye"><span class="sitestr">github.com/jbreckmckye</span></a>)</span></span>


In [20]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://news.ycombinator.com/news')

# HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

article_tag = soup.find(name="span", class_="titleline").find("a")
article_text = article_tag.get_text()
article_link = article_tag['href'] # 딕셔너리 키를 사용한 접근

print(article_text, article_link)
# article_upvotes = soup.find(name="span", class_="score").get_text()

Show HN: InstantDB – A Modern Firebase https://github.com/instantdb/instant


In [None]:
# find_all 메서드는 모든 태그를 리스트로 반환한다. 따라서 .find("a")등은 쓸 수 없다

In [23]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://news.ycombinator.com/news')

# HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

# 빈 리스트를 먼저 생성
article_texts = []
article_links = []

# 모든 기사 요소 찾기
articles = soup.find_all(name="span", class_="titleline")

# 모든 포인트 요소 찾기
scores = soup.find_all('span', class_='score')

# 각 기사에서 텍스트와 링크를 추출하여 리스트에 추가
for article in articles:
    article_tag = article.find("a")
    article_text = article_tag.get_text(strip=True)
    article_link = article_tag['href']
    
    article_texts.append(article_text)
    article_links.append(article_link)


# 최종 리스트 출력
print("기사 제목들:", article_texts)
print("기사 링크들:", article_links)

기사 제목들: ['Show HN: InstantDB – A Modern Firebase', 'Notris: A Tetris clone for the PlayStation 1', "Python's Preprocessor", 'Aerc: A well-crafted TUI for email', 'Show HN: Kardinal – Building light-weight Kubernetes dev ephemeral environments', 'Batteryless OP-1', 'Continuous reinvention: A brief history of block storage at AWS', "Ordinals aren't much worse than Quaternions", "When Serial Isn't RS-232, Geocaching with the Garmin GPS 95", 'Objective Bayesian Hypothesis Testing', 'Show HN: A Ghidra extension for exporting parts of a program as object files', 'What is an SBAT and why does everyone suddenly care', 'Rare yet Impactful – Orthographic Projection in Films and Animations', 'DRAKON', 'What If Data Is a Bad Idea?', 'Launch HN: AnswerGrid (YC S24) – Web research tool for lead generation', 'Launch HN: Arva AI (YC S24) – AI agents for instant global KYB onboarding', 'GPU utilization can be a misleading metric', 'Ethernet History Deepdive – Why Do We Have Different Frame Types?', "Ma

In [None]:
scores = soup.find_all('span', class_='score')

for score in scores:
    points = score.get_text(strip=True).split()[0]  # 첫 번째 단어(숫자)만 가져오기
    print(points)

In [29]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://news.ycombinator.com/news')

# HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

# 빈 리스트를 먼저 생성
article_texts = []
article_links = []
article_scores = []

# 모든 기사 요소 찾기
articles = soup.find_all(name="span", class_="titleline")

# 모든 포인트 요소 찾기
scores = soup.find_all('span', class_='score')

# 각 기사에서 텍스트, 링크, 포인트를 추출하여 리스트에 추가
for i in range(len(articles)):
    article_tag = articles[i].find("a")
    article_text = article_tag.get_text(strip=True)
    article_link = article_tag['href']
    
    article_texts.append(article_text)
    article_links.append(article_link)
    
    # 포인트 추출, 만약 포인트가 없는 경우 (ex: 광고 등)에는 빈 값을 추가
    if i < len(scores):
        points = scores[i].get_text(strip=True).split()[0]
        article_scores.append(points)
    else:
        article_scores.append('0')  # 포인트가 없는 경우 '0' 또는 다른 값을 추가

# 최종 리스트 출력
print("기사 제목들:", article_texts)
print("기사 링크들:", article_links)
print("기사 포인트들:", article_scores)

기사 제목들: ['Show HN: InstantDB – A Modern Firebase', 'Notris: A Tetris clone for the PlayStation 1', "Python's Preprocessor", 'Surfer is the first personal data scraper', 'Aerc: A well-crafted TUI for email', 'Show HN: Kardinal – Building light-weight Kubernetes dev ephemeral environments', 'Batteryless OP-1', 'Continuous reinvention: A brief history of block storage at AWS', 'Objective Bayesian Hypothesis Testing', "When Serial Isn't RS-232, Geocaching with the Garmin GPS 95", "Ordinals aren't much worse than Quaternions", 'Show HN: A Ghidra extension for exporting parts of a program as object files', 'What is an SBAT and why does everyone suddenly care', 'Rare yet Impactful – Orthographic Projection in Films and Animations', 'DRAKON', 'What If Data Is a Bad Idea?', 'Launch HN: AnswerGrid (YC S24) – Web research tool for lead generation', 'GPU utilization can be a misleading metric', 'Launch HN: Arva AI (YC S24) – AI agents for instant global KYB onboarding', 'Ethernet History Deepdive 

In [35]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://news.ycombinator.com/news')

# HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

# 빈 리스트를 먼저 생성
article_texts = []
article_links = []
article_scores = []

# 모든 기사 요소 찾기
articles = soup.find_all(name="span", class_="titleline")

# 모든 포인트 요소 찾기
scores = soup.find_all('span', class_='score')

# 각 기사에서 텍스트, 링크, 포인트를 추출하여 리스트에 추가
for i in range(len(articles)):
    article_tag = articles[i].find("a")
    article_text = article_tag.get_text(strip=True)
    article_link = article_tag['href']
    
    article_texts.append(article_text)
    article_links.append(article_link)
    
    # 포인트 추출, 포인트가 없는 경우 0으로 처리
    if i < len(scores):
        points = int(scores[i].get_text(strip=True).split()[0])  # 여기서 문자열을 정수로 변환
        article_scores.append(points)
    else:
        article_scores.append(0)  # 포인트가 없는 경우 '0' 추가

# 포인트가 가장 높은 기사 찾기
max_points = max(article_scores)
max_index = article_scores.index(max_points)

# 최종 결과 출력
print("가장 높은 포인트 기사:")
print("제목:", article_texts[max_index])
print("링크:", article_links[max_index])
print("포인트:", article_scores[max_index])

가장 높은 포인트 기사:
제목: Show HN: InstantDB – A Modern Firebase
링크: https://github.com/instantdb/instant
포인트: 481


In [46]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://web.archive.org/web/20200518073855/https://www.empireonline.com/movies/features/best-movies-2/')

# HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

# 모든 영화 제목 요소 찾기
movies = soup.find_all(name="h3", class_="title")  # movies로 변경

# 각 영화 제목을 출력
for movie in movies:
    print(movie.get_text(strip=True))

100) Stand By Me
99) Raging Bull
98) Amelie
97) Titanic
96) Good Will Hunting
95) Arrival
94) Lost In Translation
93) The Princess Bride
92) The Terminator
91) The Prestige
90) No Country For Old Men
89) Shaun Of The Dead
88) The Exorcist
87) Predator
86) Indiana Jones And The Last Crusade
85) LÃ©on
84) Rocky
83) True Romance
82) Some Like It Hot
81) The Social Network
15) Spirited Away
79) Captain America: Civil War
78) Oldboy
77) Toy Story
76) A Clockwork Orange
75) Fargo
74) Mulholland Dr.
73) Seven Samurai
72) Rear Window
71) Hot Fuzz
70) The Lion King
69) Singin' In The Rain
68) Ghostbusters
67) Memento
66) Return Of The Jedi
65) Avengers Assemble
64) L.A. Confidential
63) Donnie Darko
62) La La Land
61) Forrest Gump
60) American Beauty
59) E.T. â The Extra Terrestrial
58) Inglourious Basterds
57) Whiplash
56) Reservoir Dogs
55) Pan's Labyrinth
54) Vertigo
53) Psycho
52) Once Upon A Time In The West
51) It's A Wonderful Life
50) Lawrence Of Arabia
49) Trainspotting
48) The Silen

In [48]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get('https://www.empireonline.com/movies/features/best-movies-2/')

# HTML 파싱
soup = BeautifulSoup(response.text, 'html.parser')

# 모든 영화 제목 요소 찾기 (복잡한 클래스 이름을 사용)
movies = soup.find_all(name="h3", class_="listicleItem_listicle-item__title__BfenH")

# 각 영화 제목을 출력
for movie in movies[::-1]:  # 역순
    print(movie.get_text(strip=True))

1) The Lord Of The Rings: The Fellowship Of The Ring
2) Star Wars: The Empire Strikes Back
3) The Godfather
4) The Dark Knight
5) The Shawshank Redemption
6) Jaws
7) Pulp Fiction
8) Avengers: Infinity War
9) Raiders Of The Lost Ark
10) Goodfellas
11) Star Wars
12) Mad Max: Fury Road
13) Back To The Future
14) The Godfather Part II
15) Jurassic Park
16) Blade Runner
17) Aliens
18) Parasite
19) Inception
20) The Matrix
21) Alien
22) Avengers: Endgame
23) 2001: A Space Odyssey
24) Terminator 2 Judgment Day
25) Fight Club
26) Die Hard
27) The Lord Of The Rings The Return Of The King
28) Indiana Jones And The Last Crusade
29) Apocalypse Now
30) Heat
31) Interstellar
32) The Thing
33) Casablanca
34) The Lord Of The Rings: The Two Towers
35) The Shining
36) Eternal Sunshine Of The Spotless Mind
37) Se7en
38) The Good, The Bad And The Ugly
39) Gladiator
40)  Citizen Kane
41) The Silence Of The Lambs
42) 12 Angry Men
43) There Will Be Blood
44) It's A Wonderful Life
45) The Big Lebowski
46) Sch

In [49]:
movie_titles = [movie.get_text(strip=True) for movie in movies]
for n in range(len(movie_titles) - 1, -1, -1):
    print(movie_titles[n])

1) The Lord Of The Rings: The Fellowship Of The Ring
2) Star Wars: The Empire Strikes Back
3) The Godfather
4) The Dark Knight
5) The Shawshank Redemption
6) Jaws
7) Pulp Fiction
8) Avengers: Infinity War
9) Raiders Of The Lost Ark
10) Goodfellas
11) Star Wars
12) Mad Max: Fury Road
13) Back To The Future
14) The Godfather Part II
15) Jurassic Park
16) Blade Runner
17) Aliens
18) Parasite
19) Inception
20) The Matrix
21) Alien
22) Avengers: Endgame
23) 2001: A Space Odyssey
24) Terminator 2 Judgment Day
25) Fight Club
26) Die Hard
27) The Lord Of The Rings The Return Of The King
28) Indiana Jones And The Last Crusade
29) Apocalypse Now
30) Heat
31) Interstellar
32) The Thing
33) Casablanca
34) The Lord Of The Rings: The Two Towers
35) The Shining
36) Eternal Sunshine Of The Spotless Mind
37) Se7en
38) The Good, The Bad And The Ugly
39) Gladiator
40)  Citizen Kane
41) The Silence Of The Lambs
42) 12 Angry Men
43) There Will Be Blood
44) It's A Wonderful Life
45) The Big Lebowski
46) Sch

In [2]:
import os
import requests
from bs4 import BeautifulSoup

# 웹 페이지 가져오기
response = requests.get('https://www.empireonline.com/movies/features/best-movies-2/')
soup = BeautifulSoup(response.text, 'html.parser')

# 다운로드할 폴더 생성
os.makedirs('images', exist_ok=True)

# 영화 제목과 이미지 URL 찾기
movies = soup.find_all(name="div", class_="listicle_listicle__item__CJna4")

# 각 영화 제목과 이미지 URL 출력 및 다운로드
for index, movie in enumerate(movies[::-1], start=1):  # 역순 및 인덱스 시작값 설정
    title = movie.find("h3", class_="listicleItem_listicle-item__title__BfenH").get_text(strip=True)
    img_tag = movie.find("img")
    img_url = img_tag["srcset"].split(", ")[-1].split(" ")[0]  # 가장 큰 이미지 URL 가져오기
    
    # 이미지 파일명 설정
    img_name = f'images/{index:02d}_{title.replace(" ", "_")}.jpg'
    
    # 이미지 다운로드 및 저장
    img_data = requests.get(img_url).content
    with open(img_name, 'wb') as handler:
        handler.write(img_data)
    
    # 출력
    print(f"Title: {index}) {title}, Image URL: {img_url}")

print("모든 이미지가 성공적으로 다운로드되었습니다.")

Title: 1) 1) The Lord Of The Rings: The Fellowship Of The Ring, Image URL: https://images.bauerhosting.com/legacy/media/619d/bfe9/5909/d08f/544c/892d/1%20Fellowship.jpeg?auto=format&w=1440&q=80
Title: 2) 2) Star Wars: The Empire Strikes Back, Image URL: https://images.bauerhosting.com/legacy/media/619d/bf59/3ebe/4721/a59c/e4be/2%20ESB.jpg?auto=format&w=1440&q=80
Title: 3) 3) The Godfather, Image URL: https://images.bauerhosting.com/legacy/media/619d/be32/5165/433b/cc3b/7c8f/3%20Godfather.jpg?auto=format&w=1440&q=80
Title: 4) 4) The Dark Knight, Image URL: https://images.bauerhosting.com/legacy/media/619d/bd81/3ebe/47b5/fa9c/e4ac/4%20dark%20knight.jpg?auto=format&w=1440&q=80
Title: 5) 5) The Shawshank Redemption, Image URL: https://images.bauerhosting.com/legacy/media/619d/bcdd/5165/4383/223b/7c83/5%20Shawshank.jpg?auto=format&w=1440&q=80
Title: 6) 6) Jaws, Image URL: https://images.bauerhosting.com/legacy/media/619d/bc72/3ebe/47f1/829c/e4a1/6%20Jaws.jpg?auto=format&w=1440&q=80
Title: 7

In [None]:
# src 대신 data-src' 또는 'srcset 를 찾는다. ### 이미지 다운로드
import os
import requests
from bs4 import BeautifulSoup

# 1. 웹 페이지의 HTML 가져오기
url = "https://www.empireonline.com/movies/features/best-movies-2/"
response = requests.get(url)
html_content = response.text

# 2. BeautifulSoup을 사용하여 HTML 파싱
soup = BeautifulSoup(html_content, 'html.parser')

# 3. 모든 img 태그를 찾기
images = soup.find_all('img')

# 4. 이미지 저장 경로 설정
save_dir = "downloaded_images"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

# 5. 각 이미지 다운로드
for idx, img in enumerate(images):
    # 'src' 대신 'data-src' 또는 'srcset'이 있는지 확인
    img_url = img.get('data-src') or img.get('srcset') or img.get('src')
    
    if img_url:
        # srcset의 경우 여러 해상도의 이미지가 콤마로 구분되어 있을 수 있음
        if ',' in img_url:
            img_url = img_url.split(',')[-1].strip().split(' ')[0]

        # 이미지 확장자를 추출
        img_ext = os.path.splitext(img_url)[1]  # URL에서 확장자를 추출
        if not img_ext:
            img_ext = '.jpg'  # 확장자가 없으면 기본값으로 .jpg 사용

        # 이미지 다운로드
        try:
            img_response = requests.get(img_url)
            if img_response.status_code == 200:
                img_data = img_response.content
                img_filename = os.path.join(save_dir, f'image_{idx + 1}{img_ext}')
                with open(img_filename, 'wb') as f:
                    f.write(img_data)
                print(f"Downloaded: {img_filename}")
            else:
                print(f"Failed to download: {img_url}")
        except Exception as e:
            print(f"Error downloading {img_url}: {e}")
    else:
        print("No src found for an image tag")

In [14]:
# css 선택자를 이용한 빌보드 핫 100 추출 / select ####
from bs4 import BeautifulSoup
import requests

date = input("Which year do you want to travel to? Type the date in this format YYYY-MM-DD: ")

response = requests.get("https://www.billboard.com/charts/hot-100/" + date)
soup = BeautifulSoup(response.text, 'html.parser')

# 최신 구조에 맞게 수정된 선택자
song_names_spans = soup.select("li.o-chart-results-list__item h3#title-of-a-story")
song_names = [song.get_text().strip() for song in song_names_spans]

for song in song_names:
    print(song)

Incomplete
Bent
Jumpin', Jumpin'
It's Gonna Be Me
Doesn't Really Matter
Try Again
Absolutely (Story Of A Girl)
I Wanna Know
Everything You Want
No More
I Need You
Higher
He Wasn't Man Enough
Kryptonite
(Hot S**t) Country Grammar
Back Here
Let's Get Married
Desert Rose
Wifey
There You Go
I Think I'm In Love With You
Breathe
Music
Wonderful
You Sang To Me
Purest Of Pain (A Puro Dolor)
That's The Way
What'Chu Like
I Turn To You
What About Now
Separated
The Next Episode
Big Pimpin'
I Wanna Be With You
Faded
I Hope You Dance
I Will Love Again
Prayin' For Daylight
It Must Be Love
Callin' Me
Lucky
Just Be A Man About It
Smooth
One Voice
Come On Over Baby (All I Want Is You)
I Try
Whatever
Bounce With Me
Where I Wanna Be
Amazed
Give Me Just One Night (Una Noche)
I Will...But
The Light
Dance Tonight
Simple Kind Of Life
Could I Have This Kiss Forever
Flowers On The Wall
It's My Life
Yes!
What You Want
Taking You Home
You'll Always Be Loved By Me
Treat Her Like A Lady
Your Everything
Who Let The 

In [16]:
# id를 이용한 추출
from bs4 import BeautifulSoup
import requests

date = input("Which year do you want to travel to? Type the date in this format YYYY-MM-DD: ")

response = requests.get("https://www.billboard.com/charts/hot-100/" + date)
soup = BeautifulSoup(response.text, 'html.parser')

# song_names_spans = soup.find_all("h3", id="title-of-a-story", class_="c-title")
# song_names_spans = soup.find_all("h3", attrs={"id": "title-of-a-story", "class": "c-title"})

song_names = [song.get_text().strip() for song in song_names_spans]
for song in song_names:
    print(song)

Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Lizard Eliminated After 'Sink or Swim' Experience on 'The Masked Singer': 'I'm Not Used to Losing'
Gains in Weekly Performance
Additional Awards
Incomplete
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Bent
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Jumpin', Jumpin'
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
It's Gonna Be Me
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Doesn't Really Matter
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Try Again
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Absolutely (Story Of A Girl)
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
I Wanna Know
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Everything You Want
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
No More
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
I Need You
Songwriter(s):
Producer(s):
Imprint/Promotion Label:
Higher
Songwriter(s):
Producer(s):
Imprint/Promotion Lab

In [None]:
# id를 이용한 추출
from bs4 import BeautifulSoup
import requests

response = requests.get("https://www.amazon.com/s?k=gaming+headsets&_encoding=UTF8&content-id=amzn1.sym.12129333-2117-4490-9c17-6d31baf0582a&pd_rd_r=9c194fce-bb0b-4253-9b14-1bc555174918&pd_rd_w=KjGf5&pd_rd_wg=4DxIJ&pf_rd_p=12129333-2117-4490-9c17-6d31baf0582a&pf_rd_r=0DPAVZEHC9WBAW0ANN4R&ref=pd_hp_d_atf_unk")
soup = BeautifulSoup(response.text, 'html.parser')

price = soup.find("span", class_="a-offscreen").get_text(strip=True)
print(price)


In [22]:
# 실패한 코드
url="https://www.amazon.com/s?k=gaming+headsets&_encoding=UTF8&content-id=amzn1.sym.12129333-2117-4490-9c17-6d31baf0582a&pd_rd_r=9c194fce-bb0b-4253-9b14-1bc555174918&pd_rd_w=KjGf5&pd_rd_wg=4DxIJ&pf_rd_p=12129333-2117-4490-9c17-6d31baf0582a&pf_rd_r=0DPAVZEHC9WBAW0ANN4R&ref=pd_hp_d_atf_unk"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

price_elements = soup.select("span.a-price span.a-offscreen")
for price in price_elements:
    print(price.get_text(strip=True))

In [None]:
from bs4 import BeautifulSoup
import requests
import lxml

# URL과 헤더 설정
url="https://www.amazon.com/s?k=gaming+headsets&_encoding=UTF8&content-id=amzn1.sym.12129333-2117-4490-9c17-6d31baf0582a&pd_rd_r=9c194fce-bb0b-4253-9b14-1bc555174918&pd_rd_w=KjGf5&pd_rd_wg=4DxIJ&pf_rd_p=12129333-2117-4490-9c17-6d31baf0582a&pf_rd_r=0DPAVZEHC9WBAW0ANN4R&ref=pd_hp_d_atf_unk"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
response = requests.get(url, headers=headers)

# BeautifulSoup 객체 생성 시 lxml 파서 사용
soup = BeautifulSoup(response.text, 'lxml')

# 원하는 데이터 추출 (예: 가격 정보)
price_elements = soup.select("span.a-price span.a-offscreen")
for price in price_elements:
    print(price.get_text(strip=True))

In [None]:
# 봇으로 오인 
import requests
import lxml
from bs4 import BeautifulSoup

url = "https://www.amazon.com/dp/B075CYMYK6?psc=1&ref_=cm_sw_r_cp_ud_ct_FM9M699VKHTT47YD50Q6"
header = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36",
    "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"
}

response = requests.get(url, headers=header)

soup = BeautifulSoup(response.content, "lxml")
print(soup.prettify())

price = soup.find(class_="a-offscreen").get_text()
price_without_currency = price.split("$")[1]
price_as_float = float(price_without_currency)
print(price_as_float)

In [None]:
# 딕셔너리 구조를 이용해 키 값 쌍을 가져온ㄷ
import requests
from bs4 import BeautifulSoup

# 대상 URL
url = 'https://www.amazon.com/SAMSUNG-Android-Included-Expandable-Exclusive/dp/B0CWS8MNW1?ref=dlx_deals_dg_dcl_B0CWS8MNW1_dt_sl14_f0&th=1'

# 헤더 설정
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}

# 웹페이지 요청 및 파싱
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

# 텍스트를 포함한 해당 span 요소를 선택
price_section = soup.find('span', {'class': 'aok-offscreen'})

# 결과 출력
if price_section:
    print(price_section.text.strip())
else:
    print("Price not found")