When using with Faker
, there are two ways of using the providers.
Recommended way
_static/examples/recipes/imports_and_init_1.py
See the full example here <_static/examples/recipes/imports_and_init_1.py>
But this works too
_static/examples/recipes/imports_and_init_2.py
See the full example here <_static/examples/recipes/imports_and_init_2.py>
Throughout documentation we will be mixing these approaches.
- Content of the file is
Lorem ipsum
.
_static/examples/recipes/txt_file_1.py
See the full example here <_static/examples/recipes/txt_file_1.py>
- Content is generated dynamically.
- Content is limited to 1024 chars.
- Wrap lines after 80 chars.
- Prefix the filename with
zzz
.
_static/examples/recipes/docx_file_1.py
See the full example here <_static/examples/recipes/docx_file_1.py>
- 5 TXT files in the ZIP archive (default value is 5).
- Content of all files is
Lorem ipsum
.
_static/examples/recipes/zip_file_1.py
See the full example here <_static/examples/recipes/zip_file_1.py>
- 3 DOCX files in the ZIP archive.
- Content is generated dynamically.
- Content is limited to 1024 chars.
- Prefix the filenames in archive with
xxx_
. - Prefix the filename of the archive itself with
zzz
. - Inside the ZIP, put all files in directory
yyy
.
_static/examples/recipes/zip_file_2.py
See the full example here <_static/examples/recipes/zip_file_2.py>
- 9 DOCX files in the ZIP archive.
- Content is generated dynamically from given template.
_static/examples/recipes/zip_file_3.py
See the full example here <_static/examples/recipes/zip_file_3.py>
Create a ZIP file which contains 5 ZIP files which contain 5 ZIP files which contain 5 DOCX files.
- 5 ZIP files in the ZIP archive.
- Content is generated dynamically.
- Prefix the filenames in archive with
nested_level_1_
. - Prefix the filename of the archive itself with
nested_level_0_
. - Each of the ZIP files inside the ZIP file in their turn contains 5 other ZIP files, prefixed with
nested_level_2_
, which in their turn contain 5 DOCX files.
_static/examples/recipes/zip_file_4.py
See the full example here <_static/examples/recipes/zip_file_4.py>
- 50 files in the ZIP archive (limited to DOCX, EPUB and TXT types).
- Content is generated dynamically.
- Prefix the filename of the archive itself with
zzz_archive_
. - Inside the ZIP, put all files in directory
zzz
.
_static/examples/recipes/zip_file_5.py
See the full example here <_static/examples/recipes/zip_file_5.py>
- 3 files in the ZIP archive (1 DOCX, and 2 XML types).
- Content is generated dynamically.
- Filename of the archive itself is
alice-looking-through-the-glass.zip
. - Files inside the archive have fixed name (passed with
basename
argument).
_static/examples/recipes/zip_file_6.py
See the full example here <_static/examples/recipes/zip_file_6.py>
Note, that count
argument (not shown in the example, but commonly accepted by inner functions) will be simply ignored here.
- 5 TXT files in the EML email (default value is 5).
- Content of all files is
Lorem ipsum
.
_static/examples/recipes/eml_file_1.py
See the full example here <_static/examples/recipes/eml_file_1.py>
- 3 DOCX files in the EML email.
- Content is generated dynamically.
- Content is limited to 1024 chars.
- Prefix the filenames in email with
xxx_
. - Prefix the filename of the email itself with
zzz
.
_static/examples/recipes/eml_file_2.py
See the full example here <_static/examples/recipes/eml_file_2.py>
Create a EML file which contains 5 EML files which contain 5 EML files which contain 5 DOCX files.
- 5 EML files in the EML file.
- Content is generated dynamically.
- Prefix the filenames in EML email with
nested_level_1_
. - Prefix the filename of the EML email itself with
nested_level_0_
. - Each of the EML files inside the EML file in their turn contains 5 other EML files, prefixed with
nested_level_2_
, which in their turn contain 5 DOCX files.
_static/examples/recipes/eml_file_3.py
See the full example here <_static/examples/recipes/eml_file_3.py>
- 10 files in the EML file (limited to DOCX, EPUB and TXT types).
- Content is generated dynamically.
- Prefix the filename of the EML itself with
zzz
.
_static/examples/recipes/eml_file_4.py
See the full example here <_static/examples/recipes/eml_file_4.py>
- Content template is predefined and contains dynamic fixtures.
- Wrap lines after 80 chars.
_static/examples/recipes/pdf_file_1.py
See the full example here <_static/examples/recipes/pdf_file_1.py>
When pre-defined templating and dynamic fixtures are not enough and full control is needed, you can use DynamicTemplate
wrapper. It takes a list of content modifiers (tuples): (func: Callable, kwargs: dict)
. Each callable should accept the following arguments:
- `provider`: Faker
Generator
instance orFaker
instance. - `document`: Document instance. Implementation specific.
- `data`: Dictionary. Used primarily for observability.
- `counter`: Integer. Index number of the content modifier.
- `**kwargs`: Dictionary. Useful to pass implementation-specific arguments.
The following example shows how to generate a DOCX file with paragraph, table and image.
_static/examples/recipes/docx_file_mixed_1.py
See the full example here <_static/examples/recipes/docx_file_mixed_1.py>
Similarly to previous section, the following example shows how to generate an ODT file with table and image.
_static/examples/recipes/odt_file_mixed_1.py
See the full example here <_static/examples/recipes/odt_file_mixed_1.py>
_static/examples/recipes/pdf_file_reportlab_1.py
See the full example here <_static/examples/recipes/pdf_file_reportlab_1.py>
Note, that at the moment, pdfkit
is the default generator. However, you could set it explicitly as follows:
_static/examples/recipes/pdf_file_pdfkit_1.py
See the full example here <_static/examples/recipes/pdf_file_pdfkit_1.py>
Graphic PDF file does not contain text. Don't use it when you need text based content. However, sometimes you just need a valid file in PDF format, without caring much about the content. That's where a GraphicPdfFileProvider comes to rescue:
_static/examples/recipes/pdf_file_pillow_1.py
See the full example here <_static/examples/recipes/pdf_file_pillow_1.py>
The generated file will contain a random graphic (consisting of lines and shapes of different colours). One of the most useful arguments supported is size
.
_static/examples/recipes/pdf_file_pillow_2.py
See the full example here <_static/examples/recipes/pdf_file_pillow_2.py>
Graphic file providers does not contain text. Don't use it when you need text based content. However, sometimes you just need a valid image file with graphics of a certain size. That's where graphic file providers help.
Supported files formats are: BMP, GIF, ICO, JPEG, PDF, PNG, SVG TIFF and WEBP.
_static/examples/recipes/graphic_ico_file_1.py
See the full example here <_static/examples/recipes/graphic_ico_file_1.py>
_static/examples/recipes/graphic_jpeg_file_1.py
See the full example here <_static/examples/recipes/graphic_jpeg_file_1.py>
_static/examples/recipes/graphic_png_file_1.py
See the full example here <_static/examples/recipes/graphic_png_file_1.py>
_static/examples/recipes/graphic_webp_file_1.py
See the full example here <_static/examples/recipes/graphic_webp_file_1.py>
_static/examples/recipes/mp3_file_1.py
See the full example here <_static/examples/recipes/mp3_file_1.py>
_static/examples/recipes/mp3_file_gtts_1.py
See the full example here <_static/examples/recipes/mp3_file_gtts_1.py>
You can tune arguments too:
_static/examples/recipes/mp3_file_gtts_2.py
See the full example here <_static/examples/recipes/mp3_file_gtts_2.py>
Refer to https://gtts.readthedocs.io/en/latest/module.html#languages-gtts-lang for list of accepted values for lang
argument.
Refer to https://gtts.readthedocs.io/en/latest/module.html#localized-accents for list of accepted values for tld
argument.
_static/examples/recipes/mp3_file_edge_tts_1.py
See the full example here <_static/examples/recipes/mp3_file_edge_tts_1.py>
You can tune arguments too:
_static/examples/recipes/mp3_file_edge_tts_2.py
See the full example here <_static/examples/recipes/mp3_file_edge_tts_2.py>
Run edge-tts -l
from terminal for list of available voices.
Default MP3 generator class is GttsMp3Generator
which uses Google Text-to-Speech services to generate an MP3 file from given or randomly generated text. It does not require additional services to run and the only dependency here is the gtts
package. You can however implement your own custom MP3 generator class and pass it to te mp3_file
method in mp3_generator_cls
argument instead of the default GttsMp3Generator
. Read about quotas of Google Text-to-Speech services here.
Usage with custom MP3 generator class.
_static/examples/recipes/mp3_file_custom_1.py
See the full example here <_static/examples/recipes/mp3_file_custom_1.py>
See exact implementation of marytts_mp3_generator in the examples.
- Create an exact copy of the randomly picked file under a different name.
- Prefix of the destination file would be
zzz
. source_dir_path
is the absolute path to the directory to pick files from.
_static/examples/recipes/random_file_from_dir_1.py
See the full example here <_static/examples/recipes/random_file_from_dir_1.py>
- Create an exact copy of a file under a different name.
- Prefix of the destination file would be
zzz
. path
is the absolute path to the file to copy.
_static/examples/recipes/file_from_path_1.py
See the full example here <_static/examples/recipes/file_from_path_1.py>
The only two file types for which it is easy to foresee the file size are BIN and TXT. Note, that size of BIN files is always exact, while for TXT it is approximate.
_static/examples/recipes/file_of_size_bin_1.py
See the full example here <_static/examples/recipes/file_of_size_bin_1.py>
_static/examples/recipes/file_of_size_txt_1.py
See the full example here <_static/examples/recipes/file_of_size_txt_1.py>
- Use template.
- Generate 10 DOCX files.
_static/examples/recipes/files_multiprocessing_1.py
See the full example here <_static/examples/recipes/files_multiprocessing_1.py>
_static/examples/recipes/files_multiprocessing_2.py
See the full example here <_static/examples/recipes/files_multiprocessing_2.py>
See the following example:
_static/examples/recipes/augment_file_from_dir_1.py
See the full example here <_static/examples/recipes/augment_file_from_dir_1.py>
Generated file will resemble text of the original document, but will not be the same. This is useful when you don't want to test on text generated by Faker
, but rather something that makes more sense for your use case, still want to ensure uniqueness of the documents.
The following file types are supported:
DOCX
EML
EPUB
ODT
PDF
RTF
TXT
By default, all supported files are eligible for random selection. You could however narrow that list by providing extensions
argument:
_static/examples/recipes/augment_file_from_dir_2.py
See the full example here <_static/examples/recipes/augment_file_from_dir_2.py>
----Actual augmentation of texts is delegated to an abstraction layer of text augmenters. Currently, two augmenters are implemented. Default one is based on textaugment (which is in its' turn based on nltk) is very lightweight and speedy, but produces less accurate results. Another one is based on nlpaug, which is way more sophisticated, but at the cost of speed.
By default bert-base-multilingual-cased
model is used, which is pretrained on the top 104 languages with the largest Wikipedia using a masked language modeling (MLM) objective. If you want to use a different model, specify the proper identifier in the model_path
argument. Some well working options for model_path
are:
bert-base-multilingual-cased
bert-base-multilingual-uncased
bert-base-cased
bert-base-uncased
bert-base-german-cased
GroNLP/bert-base-dutch-cased
_static/examples/recipes/augment_file_from_dir_3.py
See the full example here <_static/examples/recipes/augment_file_from_dir_3.py>
Refer to nlpaug
docs and check Textual augmenters examples.
_static/examples/recipes/augment_file_from_dir_4.py
See the full example here <_static/examples/recipes/augment_file_from_dir_4.py>
If you pass raw=True
argument to any provider or inner function, instead of creating a file, you will get bytes
back (or to be totally correct, bytes
-like object BytesValue
, which is basically bytes enriched with meta-data). You could then use the bytes
content of the file to build a test payload as shown in the example test below:
_static/examples/recipes/raw_1.py
See the full example here <_static/examples/recipes/raw_1.py>
If you want to generate a file in a format that is not (yet) supported, you can try to use GenericFileProvider
. In the following example, an HTML file is generated from a template.
_static/examples/recipes/generic_file_1.py
See the full example here <_static/examples/recipes/generic_file_1.py>
_static/examples/recipes/aws_s3_storage_1.py
See the full example here <_static/examples/recipes/aws_s3_storage_1.py>
Depending on the ORM or framework you're using, you might want to tweak the root_path
and rel_path
values. Especially if you store files in directories (like your-bucket-name/path/to/the/file.ext
).
For instance, if you use Django
and django-storages
, and want to store the files inside /user/uploads
directory the following would be correct:
_static/examples/recipes/aws_s3_storage_2.py
See the full example here <_static/examples/recipes/aws_s3_storage_2.py>
_static/examples/recipes/google_cloud_storage_1.py
See the full example here <_static/examples/recipes/google_cloud_storage_1.py>
Similarly to AWSS3Storage
, if you use Django
and django-storages
, and want to store the files inside /user/uploads
directory the following would be correct:
_static/examples/recipes/google_cloud_storage_2.py
See the full example here <_static/examples/recipes/google_cloud_storage_2.py>
_static/examples/recipes/sftp_storage_1.py
See the full example here <_static/examples/recipes/sftp_storage_1.py>
When used with Django (to generate fake data with factory_boy
factories), the root_path
argument of the correspondent file storage shall be provided. Otherwise (although no errors will be triggered) the generated files will reside outside the MEDIA_ROOT
directory (by default in /tmp/
on Linux) and further operations with those files through Django will cause SuspiciousOperation
exception.
_static/examples/recipes/factory_boy_models_1.py
See the full example here <_static/examples/recipes/factory_boy_models_1.py>
_static/examples/recipes/factory_boy_factory_1.py
And then somewhere in your code:
_static/examples/recipes/factory_boy_factory_1.py
See the full example here <_static/examples/recipes/factory_boy_factory_1.py>
_static/examples/recipes/factory_boy_factory_2.py
And then somewhere in your code:
_static/examples/recipes/factory_boy_factory_2.py
See the full example here <_static/examples/recipes/factory_boy_factory_2.py>
_static/examples/recipes/factory_boy_factory_3.py
See the full example here <_static/examples/recipes/factory_boy_factory_3.py>
Faker example with AWS S3 storage
_static/examples/recipes/aws_s3_storage_3.py
See the full example here <_static/examples/recipes/aws_s3_storage_3.py>
factory-boy example with AWS S3 storage
_static/examples/recipes/aws_s3_storage_4.py
See the full example here <_static/examples/recipes/aws_s3_storage_4.py>
Flexible storage selection
_static/examples/recipes/flexible_storage_1.py
See the full example here <_static/examples/recipes/flexible_storage_1.py>