Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using the dataset #2

Closed
DenisSouth opened this issue Mar 8, 2019 · 6 comments
Closed

using the dataset #2

DenisSouth opened this issue Mar 8, 2019 · 6 comments

Comments

@DenisSouth
Copy link

DenisSouth commented Mar 8, 2019

I downloaded this dataset https://www.kaggle.com/sophatvathana/casia-dataset
but it has no any description, and some strange folder tree

├───CASIA1
│ ├───Au
│ ├───ela
│ └───Sp
├───CASIA2
│ ├───Au
│ └───Tp
└───__MACOSX
==├───CASIA1
==│ ├───Au
==│ └───Sp
==└───CASIA2
====├───Au
====└───Tp

which one should i use for train, which for test? which one is original pic which one is modified?
also i know the csv format

file_name,1 or 0 (fake or real image)
example for real image:
'datasets/train/real/Au_ani_00001.jpg',0

but i have no idea which folder should i use for source...

I appreciate for your great work, and I want repeat it by myself :- )

=========================================
so. i made this

I upload zip to google drive
unzip it to '/content/gdrive/My Drive/casia_dataset/
in google colab i generated csv by following code

is it right?

import os
path_orig = '/content/gdrive/My Drive/casia_dataset/CASIA2/Au/'
path_modif = '/content/gdrive/My Drive/casia_dataset/CASIA2/Tp/'

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
  try:
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)
  except:
    print(path_orig+file)

for file in os.listdir(path_modif):
    try:
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)
    except:
      print(path_modif+file)

for line in strings:
      with open('/content/gdrive/My Drive/casia_dataset/dataset.csv', 'a') as f:
         f.write(line)
@agusgun
Copy link
Owner

agusgun commented Mar 8, 2019

Yup, I think that is correct.

For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help.

If you already solved this issue, please close it :). Thank you very much.

@agusgun
Copy link
Owner

agusgun commented Mar 8, 2019 via email

@pidugusundeep
Copy link

What are the images with Sp??

@DenisSouth
Copy link
Author

DenisSouth commented Dec 3, 2019

What are the images with Sp??

Au is Authentic pics
Tp is Tampered pics

make CSV for train

import os
path_orig = 'casia/CASIA2/Au/' #Authentic 
path_modif = 'casia/CASIA2/Tp/' #Tampered

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)

for file in os.listdir(path_modif):
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)

for line in strings:
      with open('casia/dataset.csv', 'a') as f:
         f.write(line)

@pidugusundeep
Copy link

@DenisSouth What are the images with Sp ?? what kind of images are they?

@DenisSouth
Copy link
Author

@DenisSouth What are the images with Sp ?? what kind of images are they?

https://www.kaggle.com/sophatvathana/casia-dataset

image

it is modified jpg image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants