Data loss while writing avro file to s3 compatible storage #209

vinuthna91 · 2018-07-13T14:41:43Z

Hi,

I am converting a csv file into avro and writing to s3 compliant storage.I see that schema file(.avsc) is written properly. However, there is data loss while writing to .avro file.
Below is snippet of my code

## Code
import smart_open
from boto.compat import urlsplit, six
import boto
import boto.s3.connection

import avro.schema
from avro.datafile import  DataFileWriter 
from avro.io import  DatumWriter

import pandas as pn
import os,sys

FilePath = 's3a://mybucket/vinuthnav/csv/file1.csv' #path on s3

splitInputDir = urlsplit(FilePath, allow_fragments=False)

inConn = boto.connect_s3(
	aws_access_key_id = access_key_id,
	aws_secret_access_key = secret_access_key,
	port=int(port),
	host = hostname,
	is_secure=False,
	calling_format = boto.s3.connection.OrdinaryCallingFormat(),
	)
#get bucket
inbucket = inConn.get_bucket(splitInputDir.netloc)
#read in the csv file
kr = inbucket.get_key(splitInputDir.path)
with smart_open.smart_open(kr, 'r') as fin:
	xa = pn.read_csv(fin, header=1, error_bad_lines = False).fillna('NA')
		
rowCount, columnCount = xa.shape #check if data frame is empty, if it is, don't write outp
if rowCount == 0:
	##do nothing
	print '>> [NOTE] empty file'
	

else:
	#generate avro schema and data
	
	dataFile = os.path.join(os.path.basename(FileName), os.path.splitext(FileName)[0]+".avro")
	schemaFile = os.path.join(os.path.basename(FileName), os.path.splitext(FileName)[0]+".avsc")
	
	kwd = inbucket.get_key(urlsplit(dataFile, allow_fragments=False).path, validate=False)
	schema = gen_schema(xa.columns)
	
	with smart_open.smart_open(kwd, 'wb') as foutd: 
		
		dictRes = xa.to_dict(orient='records')
		writer = DataFileWriter(foutd, DatumWriter(), schema)
		for ll, row in enumerate(dictRes):
			writer.append(row)

mpenkov · 2018-07-14T08:31:30Z

Thank you for providing example source. Could you please describe what the problem is in detail? An example of the data loss, as well as how you detected it, would be very helpful.

vinuthna91 · 2018-07-14T19:20:39Z

I converted the avro file to Json and read the number of lines. I expect the number of lines in Json to match the number of lines in csv and it didn't match.

mpenkov · 2018-07-15T01:49:58Z

Have you confirmed that this problem still persists if you use local storage instead of smart_open?

vinuthna91 · 2018-07-15T03:19:39Z

Yes, this problem doesn't exist if I use local storage . However file on local storage was not written through smart_open

mpenkov · 2018-07-15T05:25:16Z

From your example source, I can see that you're reading the CSV file with smart_open too. Is the CSV file being read correctly? In other words, does the problem still persist if the CSV file is loaded from local storage?

vinuthna91 · 2018-07-15T06:52:43Z

Yes file is being read correctly. I counted number of lines in the dataframe i loaded . i also counted the datumWriter count. It matches but after writing to avro on s3 and reading it i see there is data loss. This confirms that problem is with writing binary data to s3

mpenkov · 2018-07-15T07:37:34Z

What version of smart_open are you using? Have you observed a similar problem with older versions?

vinuthna91 · 2018-07-15T08:03:15Z

smart_open-1.3.5 with anaconda version of python 2.7

vinuthna91 · 2018-07-15T08:18:22Z

I haven't tried with other versions

mpenkov · 2018-07-15T08:20:17Z

OK, thank you for providing detailed information about this bug. We will investigate.

vinuthna91 · 2018-07-15T08:45:49Z

thanks

vinuthna91 · 2018-07-18T13:56:37Z

Hi, do you have any update?

mpenkov · 2018-07-18T14:12:47Z

Sorry, no, I have not looked into this yet. Is this urgent for you?

vinuthna91 · 2018-07-19T02:16:14Z

Yes, this is a blocker for us . We are also searching some alternative to smart open. Haven't found any yet. At least a work around asap would be great

mpenkov · 2018-07-19T08:53:17Z

I think the workaround is to write the avro file to local storage first, and upload to S3 when it is complete.

I've started investigating the issue, but cannot reproduce the problem because your code sample is incomplete: the gen_schema function is missing. Could you please look at this file:

https://github.com/mpenkov/smart_open/blob/209/integration-tests/test_209.py

and update it so that it reproduces your problem?

vinuthna91 · 2018-07-19T15:28:23Z

Thanks for investigating the issue. writing to local storage is not an option for us because of storage problem and latency it involves
I updated the file. Please try it and let me know.

mpenkov · 2018-07-19T16:41:19Z

Where is the updated file?

vinuthna91 · 2018-07-20T16:21:06Z

I have updated the code in test_209.py you provided. It showed pushed

mpenkov · 2018-07-20T17:06:30Z

I think your push failed. Here’s the file in your repo: https://github.com/vinuthna91/smart_open/blob/209/integration-tests/test_209.py

It’s unchanged from my version.

If you cant work it out, please paste the code here as a comment.

vinuthna91 · 2018-07-23T15:53:36Z

I changed it, I am sorry for delayed response. I was out of station.

https://github.com/vinuthna91/smart_open/blob/209/integration-tests/test_209.py

def gen_schema(paramNames):
	paramNamesLen = len(paramNames)
	

	dataName = 'schema'
	avroSchemaOut = "{\n\t\"type\": 	\"record\", \"name\": \"%s\", \"namespace\": \"com.sandisk.bigdata\", \n \t\"fields\": [" %(dataName)  
	

	if paramNamesLen==0:
		#no parameters, no schema file generation
		avroSchemaOut = ''
	   
	else:
		#generate file
		for ii in range(paramNamesLen):
			typeString = "[\"%s\", \"null\"]" %('String')
			schemaString = "{ \"name\":\"%s\", \"type\":%s, \"default\":null}" % (paramNames[ii], typeString)
			if ii == 0:
				avroSchemaOut += schemaString + ',\n'
			elif ii <len(paramNames)-1:
				avroSchemaOut += "\t\t\t" + schemaString + ',\n'
			else:
				avroSchemaOut += "\t\t\t" + schemaString + '\n'
		avroSchemaOut += "\n \t\t\t]\n}"
		
	return avroSchemaOut

mpenkov · 2018-07-24T05:19:29Z

I pulled your changes and made some updates. Unfortunately, the code crashes with an error unrelated to smart_open:

  File "integration-tests/test_209.py", line 61, in <module>
    writer.append(row)
  File "/Users/misha/envs/smartopen27/lib/python2.7/site-packages/avro/datafile.py", line 196, in append
    self.datum_writer.write(datum, self.buffer_encoder)
  File "/Users/misha/envs/smartopen27/lib/python2.7/site-packages/avro/io.py", line 768, in write
    if not validate(self.writers_schema, datum):
  File "/Users/misha/envs/smartopen27/lib/python2.7/site-packages/avro/io.py", line 103, in validate
    schema_type = expected_schema.type
AttributeError: 'str' object has no attribute 'type'

Please have a look and update the code. The updated code is here: https://github.com/mpenkov/smart_open/blob/209/integration-tests/test_209.py

vinuthna91 · 2018-07-25T16:15:12Z

Updated code is here:
https://github.com/mpenkov/smart_open/pull/1/commits/3eaf46f0ca48d23b50f15921bdb16fe812ed17e2

mpenkov · 2018-07-26T00:59:19Z

Thank you for the update. Unfortunately, I still cannot reproduce your problem because of errors in avro.

avro.io.AvroTypeException: The datum {'93493318071517': 93493318071467, 'MULEY FANATIC FOUNDATION OF WYOMING INC': 'KALAMAZOO COMMUNITY FOUNDATION', '201713189349307151': 201713189349307146, 'EFILE': 'EFILE', '201612': 201612, '453578215': 383333202, '15109264': 15109263, '1/10/2018 1:03:25 PM': '1/10/2018 1:03:23 PM', '990': '990'} is not an example of the schema {
  "namespace": "com.sandisk.bigdata",
  "type": "record",
  "name": "schema",
  "fields": [
    {
      "default": null,
      "type": [
        "null",
        "string"
      ],
      "name": "15109264"
    },
    {
      "default": null,
      "type": [
        "null",
        "string"
      ],
      "name": "EFILE"
    },

Can you please make sure the test runs and reproduces your actual problem?

vinuthna91 · 2018-07-26T11:46:24Z

Can you give the csv you used?

mpenkov · 2018-07-26T11:53:12Z

The CSV gets downloaded as part of the actual test. Please see the source file.

…

On Thu, Jul 26, 2018 at 20:46 vinuthna91 ***@***.***> wrote: Can you give the csv you used? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDOVNa4S6NuvZly7GhcHDkbWl-zmRhYks5uKawQgaJpZM4VPAWw> .

vinuthna91 · 2018-07-26T17:32:53Z

Could you please check now. I updated the code.
https://github.com/vinuthna91/smart_open/blob/209/integration-tests/test_209.py

vinuthna91 · 2018-07-27T05:08:38Z

Were you able to try it? Please confirm that the code is working now

mpenkov · 2018-07-27T07:50:38Z

Yes, the code is working now. I was able to use it to reproduce the problem. I also simplified it a bit to rule out errors in the actual test: https://github.com/mpenkov/smart_open/blob/209/integration-tests/test_209.py

I'll take it from here. Thanks for your help in reproducing the problem.

mpenkov · 2018-07-27T07:53:09Z

$ SO_NUMROWS=10000 SO_S3_URL=s3://redacted/test_smart_open python integration-tests/test_209.py
download: s3://redacted/test_smart_open/issue_209/out.avro to ./remote.avro
Binary files local.avro and remote.avro differ

mpenkov · 2018-07-30T08:07:42Z

Without seeing your actual test case (source code or at least shell commands), it's hard for me to comment.

I ran an additional test to ensure that there is no data loss when reading files from S3:

(smart_open2.7) misha@cabron:~/git/smart_open$ python integration-tests/s3_cp.py s3://irs-form-990/index_2018.csv index_2018_smartopen.csv
(smart_open2.7) misha@cabron:~/git/smart_open$ aws s3 cp s3://irs-form-990/index_2018.csv index_2018.csv 
download: s3://irs-form-990/index_2018.csv to ./index_2018.csv     
(smart_open2.7) misha@cabron:~/git/smart_open$ diff index_2018_smartopen.csv index_2018.csv 
(smart_open2.7) misha@cabron:~/git/smart_open$

I downloaded the CSV file using 1) smart_open 2) AWS CLI and compared the two files. They were identical, so there is no data loss in the reading case, either.

vinuthna91 · 2018-07-30T09:14:27Z

I have used same code provided by you except that i used my local data set. My csv file size is only 90KB

On line number 72, I have added #writer.close(). When writer.close() is uncommented, writing local.avro to local storage is complete and there is no data loss. However, writing to s3 storage throws an error AttributeError: 'S3OpenWrite' object has no attribute 'flush'.
Here local.avro and remote.avro differ

When you comment this( writer.close() is not used), local.avro and remote.avro both are same but, there is data loss in both the files when compared to csv data.
That is required to write avro completely.

scottbelden · 2018-07-30T11:21:39Z

I don't know how pandas works, but if you look at the data variable which is the result of running the csv file through pandas you'll notice that it only has [477 rows x 9 columns]. The csv file has over 172,000 rows, so it seems like some data is getting truncated there.

scottbelden · 2018-07-30T11:27:33Z

Oh, the truncation is just because of the SO_NUMROWS=477. By changing that to 200000 all of the lines are processed

vinuthna91 · 2018-07-30T11:36:07Z

i don't see truncation in number of rows in pandas data frame. i see it only in the avro file. I am counting number of records written to local avro file when we use writer.close() and when we do not use.

scottbelden · 2018-07-30T11:41:49Z

When I read in all the data using SO_NUMROWS=200000 and then print the resulting number or records I see that local.avro has 172396 while remote.avro has 172389

vinuthna91 · 2018-07-30T11:42:31Z

when writer.close() is called it internally calls a flush() method that Flushes the current state of the file, including metadata. The error i am getting is 'S3OpenWrite' object has no attribute 'flush'. because of which metadata is being lost.

scottbelden · 2018-07-30T11:45:41Z

I don't see that exception from the test script, but yeah, that would make sense.

vinuthna91 · 2018-07-30T11:49:33Z

@scottbelden that is exactly the data loss i am talking about.

If you uncomment writer.close() in line#72, you should see the exception while writing to avro file on s3

scottbelden · 2018-07-30T11:52:02Z

I'm not sure which script has writer.close() on line 72...

vinuthna91 · 2018-07-30T11:54:13Z

I made the change to test script. If you do not see the change, you need to add writer.close() to the write_avro function definition:

def write_avro(foutd):
schema = avro.schema.parse(avroSchemaOut)
dictRes = data.to_dict(orient='records')
writer = avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema)
for ll, row in enumerate(dictRes):
writer.append(row)
writer.close()

mpenkov · 2018-07-30T11:58:28Z

~~@vinuthna91 Why are you closing the writer manually? The writer is a context manager. It will take care of closing and cleaning up when the context exits (at the end of the "with" block).~~

OK, I think see what's going on here. I'll look at this again on the weekend.

vinuthna91 · 2018-07-30T12:01:48Z

Thanks!

vinuthna91 · 2018-08-02T04:52:06Z

could you find anything?

mpenkov · 2018-08-02T04:59:38Z

OK, I think see what's going on here. I'll look at this again on the weekend.

It's still Thursday. Weekend starts in two days.

vinuthna91 · 2018-08-02T14:40:37Z

Okay.

mpenkov · 2018-08-04T02:21:48Z

OK, it's the weekend, and I've had a look at it. Time to clear things up.

The "data loss" referred to in this ticket does not come from smart_open. It comes from misusing avro. You need to either:

Close the writer using writer.close (as @vinuthna91 mentioned a few posts ago)
Use the writer as a context manager.

Both work identically. If you don't call them, avro keeps some data in its internal buffers, and never writes it to the output file.

However, if you do the above while using smart_open, avro ends up calling BufferedOutputWriter.close twice. The first time succeeds, but the second time fails due to a bug in that method. I've added a test and fixed that bug.

Strangely, I haven't been able to reproduce the errors related to the absence of a flush method. @vinuthna91, can you reproduce the error with the new code?

vinuthna91 · 2018-08-04T11:29:02Z

@mpenkov Thanks for looking into it.

unfortunately, i used the same code and it still throws me error about absence of flush method.
Error below-

CRITICAL:root:writing to <open file 'local.avro', mode 'wb' at 0x7fdb98674ae0>
CRITICAL:root:writing to <open file 'local-so.avro', mode 'wb' at 0x7fdb98674b70>
CRITICAL:root:writing to <open file 'local-nomanual.avro', mode 'wb' at 0x7fdb98674ae0>
CRITICAL:root:writing to <smart_open.smart_open_lib.S3OpenWrite object at 0x7fdb98624c90>
CRITICAL:root:writing to <smart_open.smart_open_lib.S3OpenWrite object at 0x7fdb985ef790>
ERROR:smart_open.smart_open_lib:encountered error while terminating multipart upload; attempting cancel
Traceback (most recent call last):
  File "test_smartopenavro.py", line 129, in <module>
    write_avro_context_manager(foutd)
  File "test_smartopenavro.py", line 68, in write_avro_context_manager
    writer_contextManager.append(row)
  File "/usr/anaconda/lib/python2.7/site-packages/avro/datafile.py", line 131, in __exit__
    self.close()
  File "/usr/anaconda/lib/python2.7/site-packages/avro/datafile.py", line 219, in close
    self.flush()
  File "/usr/anaconda/lib/python2.7/site-packages/avro/datafile.py", line 215, in flush
    self.writer.flush()
AttributeError: 'S3OpenWrite' object has no attribute 'flush'
Traceback (most recent call last):
  File "test_smartopenavro.py", line 129, in <module>
    write_avro_context_manager(foutd)
  File "test_smartopenavro.py", line 68, in write_avro_context_manager
    writer_contextManager.append(row)
  File "/usr/anaconda/lib/python2.7/site-packages/avro/datafile.py", line 131, in __exit__
    self.close()
  File "/usr/anaconda/lib/python2.7/site-packages/avro/datafile.py", line 219, in close
    self.flush()
  File "/usr/anaconda/lib/python2.7/site-packages/avro/datafile.py", line 215, in flush
    self.writer.flush()
AttributeError: 'S3OpenWrite' object has no attribute 'flush'

Since, i get the error, I also added another method which doesnt have manual close. that how i am trying to write now-

def write_avro_no_manual_close(foutd):
    schema = avro.schema.parse(avroSchemaOut)
    dictRes = data.to_dict(orient='records')
    writer = avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema)
    for ll, row in enumerate(dictRes):
        writer.append(row)
    #writer.close()

def write_avro_context_manager(foutd):
    schema = avro.schema.parse(avroSchemaOut)
    dictRes = data.to_dict(orient='records')
    with avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema) as writer:
        for ll, row in enumerate(dictRes):
            writer.append(row)

@mock.patch('avro.datafile.DataFileWriter.generate_sync_marker', mock.Mock(return_value=b'0123456789abcdef'))
def write_avro_manual_close(foutd):
    schema = avro.schema.parse(avroSchemaOut)
    dictRes = data.to_dict(orient='records')
    writer = avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema)
    for ll, row in enumerate(dictRes):
        writer.append(row)
    writer.close()

with open('local.avro', 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_context_manager(foutd)

with open('local-so.avro', 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_manual_close(foutd)

with open('local-nomanual.avro', 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_no_manual_close(foutd)

I clearly see that local.avro, local-so.avro are identical but not identical to local-nomanual.avro.

I am pasting the code i am using also for your reference-

import os
import os.path as P

import avro.io
import avro.datafile
import pandas as pn
import smart_open
import six

import subprocess
import warnings
import boto
from boto.compat import urlsplit, six
import boto.s3.connection

import logging
import json

###### Added by me
access_key_id=access_key_id
secret_access_key =secret_access_key
port=port
hostname=hostname
##########

with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    import pandas as pn

logging.basicConfig(level=logging.ERROR)

if six.PY3:
    assert False, 'this code only runs on Py2.7'

_S3_URL = 's3://bucket-for-testing/user/vinuthna'
assert _S3_URL is not None, 'please set the SO_S3_URL environment variable'

'''
_NUMROWS = os.environ.get('SO_NUMROWS')
if _NUMROWS is not None:
    _NUMROWS = int(_NUMROWS)


'''
def gen_schema(data):
    schema = {
        'type': 'record', 'name': 'data', 'namespace': 'namespace',
        'fields': [
            {'name': field, 'type': ['null', 'string'], 'default': None}
            for field in data.columns
        ]
    }
    return json.dumps(schema, indent=4)

def write_avro_nomanualclose(foutd):
    schema = avro.schema.parse(avroSchemaOut)
    dictRes = data.to_dict(orient='records')
    writer_nomanual = avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema)
    for ll, row in enumerate(dictRes):
        writer_nomanual.append(row)
    #writer_manual.close()

def write_avro_context_manager(foutd):
    schema = avro.schema.parse(avroSchemaOut)
    dictRes = data.to_dict(orient='records')
    with avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema) as writer_contextManager:
        for ll, row in enumerate(dictRes):
            writer_contextManager.append(row)

def write_avro_manual_close(foutd):
    schema = avro.schema.parse(avroSchemaOut)
    dictRes = data.to_dict(orient='records')
    writer_manual = avro.datafile.DataFileWriter(foutd, avro.io.DatumWriter(), schema)
    for ll, row in enumerate(dictRes):
        writer_manual.append(row)
    writer_manual.close()

inputFilePath = P.join(_S3_URL,'test.csv')

splitInputDir = urlsplit(inputFilePath, allow_fragments=False)

#logging.info(">>> establishing connection for inputDir")
#establish connection manuelly for customized options
inConn = boto.connect_s3(
    aws_access_key_id = access_key_id,
    aws_secret_access_key = secret_access_key,
    port=int(port),
    host = hostname,
    is_secure=False,
    calling_format = boto.s3.connection.OrdinaryCallingFormat(),
    )
#get bucket
inbucket = inConn.get_bucket(splitInputDir.netloc)
# read in the csv file
kr = inbucket.get_key(splitInputDir.path)

assert kr is not None, 'File not present'

with smart_open.smart_open(kr, 'r') as fin:
    data = pn.read_csv(fin, header=1, error_bad_lines=False,dtype='str').fillna('NA')

num_csv_rows = len(data.index)
avroSchemaOut = gen_schema(data)

#No dataloss
with open('local.avro', 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_context_manager(foutd)

###No dataloss
with open('local-so.avro', 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_manual_close(foutd)

## Data loss observed
with open('local-nomanual.avro', 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_nomanualclose(foutd)

##Below works
outputfilepath = _S3_URL + '/nomanual.avro'
splitoutdir = urlsplit(outputfilepath, allow_fragments=False)
kw = inbucket.get_key(splitoutdir.path,validate=False)
with smart_open.smart_open(kw, 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_nomanualclose(foutd)

### This throws error
outputfilepath = _S3_URL + '/outcontext.avro'
splitoutdir = urlsplit(outputfilepath, allow_fragments=False)
kw = inbucket.get_key(splitoutdir.path,validate=False)
with smart_open.smart_open(kw, 'wb') as foutd:
    logging.critical('writing to %r', foutd)
    write_avro_context_manager(foutd)

vinuthna91 · 2018-08-04T11:30:36Z

Also, in your test code, try reading csv file from s3 because my test case is reading a csv from s3 -> converting to avro -> writing back avro to s3

mpenkov · 2018-08-04T12:11:15Z

Whoa... Please format your code next time. Besides, most of that code is not relevant, because it demonstrates facts that we've already established (if you don't close the Avro writer, you'll lose some data).

Also, in your test code, try reading csv file from s3 because my test case is reading a csv from s3 -> converting to avro -> writing back avro to s3

Thank you for giving me details about your test case, I don't think this is relevant to the problem.

What version of avro are you using?

vinuthna91 · 2018-08-04T13:10:53Z

Sure, I will format it. I am using version 2.0

mpenkov · 2018-08-04T13:21:26Z

That doesn't sound right.

(smart_open2.7) misha@cabron:~/git/smart_open$ pip freeze | grep avro
avro==1.8.2
(smart_open2.7) misha@cabron:~/git/smart_open$ pip install --upgrade avro
Requirement already up-to-date: avro in /home/misha/envs/smart_open2.7/lib/python2.7/site-packages (1.8.2)
(smart_open2.7) misha@cabron:~/git/smart_open$

The newest version in pip is 1.8.2. The current version on apache.org is also 1.8.2.

vinuthna91 · 2018-08-04T13:22:51Z

sorry it is 1.8.0. I misinterpreted it

mpenkov · 2018-08-04T13:38:58Z

I tried to reproduce the problem with 1.8.0 but couldn't.

In the immediate future, you could try mocking out the flush method:

# completely untested code!
with smart_open.smart_open(s3_url, 'wb') as fout:
    fout.flush = lambda: None
    write_avro(data, fout)

I'll look at adding flush support to the API.

vinuthna91 · 2018-08-04T13:53:28Z

Thanks. It says object is read only

Traceback (most recent call last):
File "test_smartopenavro.py", line 119, in
write_avro_nomanualclose(foutd)
File "test_smartopenavro.py", line 59, in write_avro_nomanualclose
foutd.flush = lambda: None
AttributeError: 'file' object attribute 'flush' is read-only

mpenkov · 2018-08-04T14:01:57Z

Sounds like you're overriding flush for the wrong object.

AttributeError: 'file' object attribute 'flush' is read-only

This means that flush already exists for that particular object.

vinuthna91 · 2018-08-04T14:05:46Z

Got it. Its working now . Thanks a lot for the work around :)

mpenkov · 2018-08-31T12:07:46Z

Closed via #212

mpenkov added the bug label Jul 15, 2018

mpenkov closed this as completed Aug 31, 2018

vinuthna91 mentioned this issue Dec 3, 2018

False error while writing avro to s3compatible storage #251

Closed

Data loss while writing avro file to s3 compatible storage #209

Data loss while writing avro file to s3 compatible storage #209

Comments

vinuthna91 commented Jul 13, 2018 • edited by mpenkov

mpenkov commented Jul 14, 2018

vinuthna91 commented Jul 14, 2018

mpenkov commented Jul 15, 2018

vinuthna91 commented Jul 15, 2018

mpenkov commented Jul 15, 2018

vinuthna91 commented Jul 15, 2018

mpenkov commented Jul 15, 2018

vinuthna91 commented Jul 15, 2018

vinuthna91 commented Jul 15, 2018

mpenkov commented Jul 15, 2018

vinuthna91 commented Jul 15, 2018

vinuthna91 commented Jul 18, 2018

mpenkov commented Jul 18, 2018

vinuthna91 commented Jul 19, 2018

mpenkov commented Jul 19, 2018

vinuthna91 commented Jul 19, 2018

mpenkov commented Jul 19, 2018

vinuthna91 commented Jul 20, 2018

mpenkov commented Jul 20, 2018

vinuthna91 commented Jul 23, 2018 • edited by mpenkov

mpenkov commented Jul 24, 2018

vinuthna91 commented Jul 25, 2018

mpenkov commented Jul 26, 2018 • edited

vinuthna91 commented Jul 26, 2018

mpenkov commented Jul 26, 2018 via email

vinuthna91 commented Jul 26, 2018 • edited

vinuthna91 commented Jul 27, 2018

mpenkov commented Jul 27, 2018

mpenkov commented Jul 27, 2018 • edited

mpenkov commented Jul 30, 2018

vinuthna91 commented Jul 30, 2018

scottbelden commented Jul 30, 2018

scottbelden commented Jul 30, 2018

vinuthna91 commented Jul 30, 2018

scottbelden commented Jul 30, 2018 • edited

vinuthna91 commented Jul 30, 2018

scottbelden commented Jul 30, 2018

vinuthna91 commented Jul 30, 2018

scottbelden commented Jul 30, 2018

vinuthna91 commented Jul 30, 2018

mpenkov commented Jul 30, 2018 • edited

vinuthna91 commented Jul 30, 2018

vinuthna91 commented Aug 2, 2018

mpenkov commented Aug 2, 2018 • edited

vinuthna91 commented Aug 2, 2018

mpenkov commented Aug 4, 2018

vinuthna91 commented Aug 4, 2018 • edited by mpenkov

vinuthna91 commented Aug 4, 2018

mpenkov commented Aug 4, 2018

vinuthna91 commented Aug 4, 2018

mpenkov commented Aug 4, 2018

vinuthna91 commented Aug 4, 2018

mpenkov commented Aug 4, 2018

vinuthna91 commented Aug 4, 2018

mpenkov commented Aug 4, 2018

vinuthna91 commented Aug 4, 2018

mpenkov commented Aug 31, 2018

vinuthna91 commented Jul 13, 2018 •

edited by mpenkov

vinuthna91 commented Jul 23, 2018 •

edited by mpenkov

mpenkov commented Jul 26, 2018 •

edited

vinuthna91 commented Jul 26, 2018 •

edited

mpenkov commented Jul 27, 2018 •

edited

scottbelden commented Jul 30, 2018 •

edited

mpenkov commented Jul 30, 2018 •

edited

mpenkov commented Aug 2, 2018 •

edited

vinuthna91 commented Aug 4, 2018 •

edited by mpenkov