Langchain had an issue outputting valid json; the script proceeded to freak out and crash.
(scrapeai) jay@DESKTOP-N3AJ5U8:~/Github/ScrapeGraphAI$ python3 scrapegraphAI.py
2024-05-08 20:11:10,274 - DEBUG - [selector_events.__init__:54] - Using selector: EpollSelector
2024-05-08 20:11:10,292 - INFO - [chromium.ascrape_playwright:50] - Starting scraping...
2024-05-08 20:11:13,686 - INFO - [chromium.ascrape_playwright:58] - Content scraped
2024-05-08 20:11:14,537 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:17,607 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:17,608 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:17,990 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:17,991 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:18,477 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:18,478 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:18,877 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:18,878 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,307 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,308 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,477 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,478 - DEBUG - [loader.<module>:62] - Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
2024-05-08 20:11:19,478 - INFO - [loader.<module>:86] - Loading faiss with AVX2 support.
2024-05-08 20:11:19,488 - INFO - [loader.<module>:88] - Successfully loaded faiss with AVX2 support.
2024-05-08 20:11:19,494 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,567 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,569 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:19,767 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:19,768 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:20,137 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:20,138 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:20,580 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:20,581 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,007 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:21,008 - DEBUG - [math.cosine_similarity:34] - Unable to import simsimd, defaulting to NumPy implementation. If you want to use simsimd please install with `pip install simsimd`.
2024-05-08 20:11:21,009 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,083 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/embeddings HTTP/1.1" 200 None
2024-05-08 20:11:21,084 - DEBUG - [math.cosine_similarity:34] - Unable to import simsimd, defaulting to NumPy implementation. If you want to use simsimd please install with `pip install simsimd`.
2024-05-08 20:11:21,165 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,168 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,170 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:21,175 - DEBUG - [connectionpool._new_conn:244] - Starting new HTTP connection (1): localhost:11434
2024-05-08 20:11:26,483 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:11:49,896 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:12:13,785 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:12:44,973 - DEBUG - [connectionpool._make_request:549] - http://localhost:11434 "POST /api/chat HTTP/1.1" 200 None
2024-05-08 20:12:44,975 - ERROR - [scrapegraphAI.<module>:112] - An error occurred during scraping: Invalid json output:
Traceback (most recent call last):
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/json.py", line 66, in parse_result
return parse_json_markdown(text)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/utils/json.py", line 147, in parse_json_markdown
return _parse_json(json_str, parser=parser)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/utils/json.py", line 160, in _parse_json
return parser(json_str)
^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/utils/json.py", line 120, in parse_partial_json
return json.loads(s, strict=strict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/json/__init__.py", line 359, in loads
return cls(**kw).decode(s)
^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/jay/Github/ScrapeGraphAI/scrapegraphAI.py", line 81, in <module>
result = smart_scraper_graph.run()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 116, in run
self.final_state, self.execution_info = self.graph.execute(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 107, in execute
result = current_node.execute(state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/scrapegraphai/nodes/generate_answer_node.py", line 133, in execute
answer = map_chain.invoke({"question": user_prompt})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3142, in invoke
output = {key: future.result() for key, future in zip(steps, futures)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3142, in <dictcomp>
output = {key: future.result() for key, future in zip(steps, futures)}
^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2499, in invoke
input = step.invoke(
^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/base.py", line 169, in invoke
return self._call_with_config(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1626, in _call_with_config
context.run(
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 347, in call_func_with_variable_args
return func(input, **kwargs) # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/base.py", line 170, in <lambda>
lambda inner_input: self.parse_result(
^^^^^^^^^^^^^^^^^^
File "/home/jay/anaconda3/envs/scrapeai/lib/python3.11/site-packages/langchain_core/output_parsers/json.py", line 69, in parse_result
raise OutputParserException(msg, llm_output=text) from e
langchain_core.exceptions.OutputParserException: Invalid json output:
Describe the bug
A clear and concise description of what the bug is.
Langchain had an issue outputting valid json; the script proceeded to freak out and crash.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expected to be able to scrape the page and get some jsonl; instead langchain freaked out
Here's an output of the error:
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.