Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Skipping partition as no new files detected #18

Open
RomeLeader opened this issue Sep 25, 2019 · 3 comments
Open

Skipping partition as no new files detected #18

RomeLeader opened this issue Sep 25, 2019 · 3 comments

Comments

@RomeLeader
Copy link

RomeLeader commented Sep 25, 2019

Hi,

My log bucket is fairly large in size, however we have Glaicered anything older than three months. When I run the job, I get the following, as it completes in a minute or two:

19/09/25 13:26:02 WARN HadoopDataSource: Skipping Partition
{}as no new files detected @ s3://<BUCKET>/ / or path does not exist

where is the name of my S3 access log storage bucket.

My logs are being saved at top-level in the S3 bucket, i.e. all log files are at s3:///

What could be happening here? I know there are logs in the bucket that are not partitioned, and the converted DB/tables are empty when I preview them. I have given the classification of the raw data table as CSV, but I am not sure what is correct.

Any pointers would be appreciated!

@MarcusElwin
Copy link

MarcusElwin commented Jun 30, 2022

We get a similar issue when a file is not in s3 and an empty DataFrame is still created, shouldn't this raise an exception?:

22/06/30 08:52:18 WARN HadoopDataSource: Skipping Partition {} as no new files detected @ s3://sample-bucket/test/dict_most_common_names_old.csv or path does not exist
Empty DataFrame
Columns: []
Index: []
<class 'pandas.core.frame.DataFrame'>

@MyJBMe
Copy link

MyJBMe commented Apr 7, 2023

I experienced the same error. Turned out my glue job just did not have enough permissions. Thereby you may check your assigned role.

@TLazarevic
Copy link

What permissions were you missing @MyJBMe ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants