Skip to content

Langchain had an issue. #184

@jaelliot

Description

@jaelliot

Describe the bug
A clear and concise description of what the bug is.

Langchain had an issue outputting valid json; the script proceeded to freak out and crash.

To Reproduce
Steps to reproduce the behavior:

  1. Go to pet poison resource
  2. Try to scrape the page.
  3. You're going to get an error code.
  4. See error

Expected behavior
I expected to be able to scrape the page and get some jsonl; instead langchain freaked out

Here's an output of the error:

(scrapeai) jay@DESKTOP-N3AJ5U8:~/Github/ScrapeGraphAI$ python3 scrapegraphAI.py 
2024-05-08 20:11:10,274 - DEBUG - [selector_events.__init__:54] - Using selector: EpollSelector
2024-05-08 20:11:10,292 - INFO - [chromium.ascrape_playwright:50] - Starting scraping...
2024-05-08 20:11:13,686 - INFO - [chromium.ascrape_playwright:58] - Content scraped
2024-05-08 20:11:14,537 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:17,607 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:17,608 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:17,990 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:17,991 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:18,477 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:18,478 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:18,877 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:18,878 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,307 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,308 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,477 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,478 - DEBUG - [loader.<module>:62] - Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
2024-05-08 20:11:19,478 - INFO - [loader.<module>:86] - Loading faiss with AVX2 support.
2024-05-08 20:11:19,488 - INFO - [loader.<module>:88] - Successfully loaded faiss with AVX2 support.
2024-05-08 20:11:19,494 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,567 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,569 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,767 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,768 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:20,137 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:20,138 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:20,580 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:20,581 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,007 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:21,008 - DEBUG - [math.cosine_similarity:34] - Unable to import simsimd, defaulting to NumPy implementation. If you want to use simsimd please install with `pip install simsimd`.
2024-05-08 20:11:21,009 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,083 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:21,084 - DEBUG - [math.cosine_similarity:34] - Unable to import simsimd, defaulting to NumPy implementation. If you want to use simsimd please install with `pip install simsimd`.
2024-05-08 20:11:21,165 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,168 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,170 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,175 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:26,483 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:11:49,896 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:12:13,785 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:12:44,973 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:12:44,975 - ERROR - [scrapegraphAI.<module>:112] - An error occurred during scraping: Invalid json output: 
Traceback (most recent call last):
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/json.py", line 66, in parse_result
    return parse_json_markdown(text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/utils/json.py", line 147, in parse_json_markdown
    return _parse_json(json_str, parser=parser)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/utils/json.py", line 160, in _parse_json
    return parser(json_str)
           ^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/utils/json.py", line 120, in parse_partial_json
    return json.loads(s, strict=strict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jay/Github/ScrapeGraphAI/scrapegraphAI.py", line 81, in <module>
    result = smart_scraper_graph.run()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 116, in run
    self.final_state, self.execution_info = self.graph.execute(inputs)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 107, in execute
    result = current_node.execute(state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/scrapegraphai/nodes/generate_answer_node.py", line 133, in execute
    answer = map_chain.invoke({"question": user_prompt})
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3142, in invoke
    output = {key: future.result() for key, future in zip(steps, futures)}
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3142, in <dictcomp>
    output = {key: future.result() for key, future in zip(steps, futures)}
                   ^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
    input = step.invoke(
            ^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/base.py", line 169, in invoke
    return self._call_with_config(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1626, in _call_with_config
    context.run(
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 347, in call_func_with_variable_args
    return func(input, **kwargs)  # type: ignore[call-arg]
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/base.py", line 170, in <lambda>
    lambda inner_input: self.parse_result(
                        ^^^^^^^^^^^^^^^^^^
  File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/json.py", line 69, in parse_result
    raise OutputParserException(msg, llm_output=text) from e
langchain_core.exceptions.OutputParserException: Invalid json output: 

Desktop (please complete the following information):

  • OS: [e.g. iOS] Windows
  • Browser [e.g. chrome, safari] N/A
  • Version [e.g. 22] I don't know the version; I installed scrapegraphai yesterday

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions